Skip to main content

A Semi-supervised Graph Deep Neural Network for Automatic Protein Function Annotation

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2022)

Abstract

The protein function annotation based on functional properties like the Enzyme Commission (EC) numbers is a very challenging task that aims to understand life at the molecular level. Especially, the size of features for each protein is very huge and the number of labeled samples is limited, which can significantly affect the annotation accuracy. To address these issues, we propose a novel semi-supervised graph deep learning model that aims to learn better latent representations for each protein/node by taking into account the neighborhood information in order to improve the annotation. Firstly, we extract a set of features from raw protein data. Each protein is associated with a 1-D feature vector that represents its InterPro domain composition. As D, the number of possible interPro domains, is very high (>11,000), we design a deep autoencoder model (DAE) that seeks to find an efficient representation of the domain composition of proteins in a lower dimensional latent space. Then, we construct a protein graph where each node is a protein associated with its latent representation vector and each edge is weighted by the Euclidean distance between the two nodes it connects. Finally, we train a semi-supervised graph neural network (SGNN) for the automatic protein function annotation using the constructed protein graph. Experiments are conducted on four reference proteomes in UniProtKB/SwissProt, including Human, Arabidopsis Thaliana, Mouse, and Rat. Experimental results show that the proposed model is competitive for protein function annotation compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aridhi, S., Nguifo, E.M.: Big graph mining: frameworks and techniques. Big Data Res. 6, 1–10 (2016)

    Article  Google Scholar 

  2. Cao, Y., Shen, Y.: Tale: transformer-based protein function annotation with joint sequence-label embedding. Bioinformatics 37(18), 2825–2833 (2021)

    Article  CAS  PubMed Central  Google Scholar 

  3. Consortium, U.: UniProt: a hub for protein information. Nucleic Acids Res. 43(D1), D204–D212 (2015)

    Google Scholar 

  4. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural. Inf. Process. Syst. 29, 3844–3852 (2016)

    Google Scholar 

  5. Dohan, D., Gane, A., Bileschi, M.L., Belanger, D., Colwell, L.: Improving protein function annotation via unsupervised pre-training: robustness, efficiency, and insights. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2782–2791 (2021)

    Google Scholar 

  6. Gligorijević, V., et al.: Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12(1), 1–14 (2021)

    Article  Google Scholar 

  7. Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017)

  8. Hanachi, R., Sellami, A., Farah, I.R.: Interpretation of human behavior from multi-modal brain MRI images based on graph deep neural networks and attention mechanism. In: VISIGRAPP (4: VISAPP), pp. 56–66 (2021)

    Google Scholar 

  9. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  10. Ko, C.W., Huh, J., Park, J.W.: Deep learning program to predict protein functions based on sequence information. MethodsX, p. 101622 (2022)

    Google Scholar 

  11. Leon, A., Pastor, O.: Towards a shared, conceptual model-based understanding of proteins and their interactions. IEEE Access 9, 73608–73623 (2021)

    Article  Google Scholar 

  12. Li, Y., et al.: Deepre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 34(5), 760–769 (2018)

    Article  CAS  PubMed  Google Scholar 

  13. Ma, Y., Li, Q., Hu, N., Li, L.: Sebiograph: semi-supervised deep learning for the graph via sustainable knowledge transfer. Front. Neurorobot. 15, 32 (2021)

    Article  Google Scholar 

  14. Saidi, R., Aridhi, S., Nguifo, E.M., Maddouri, M.: Feature extraction in protein sequences classification: a new stability measure. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 683–689 (2012)

    Google Scholar 

  15. Sarker, B., Khare, N., Devignes, M.-D., Aridhi, S.: Graph based automatic protein function annotation improved by semantic similarity. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds.) IWBBIO 2020. LNCS, vol. 12108, pp. 261–272. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45385-5_24

    Chapter  Google Scholar 

  16. Sarker, B., Ritchie, D.W., Aridhi, S.: GrAPFI: predicting enzymatic function of proteins from domain similarity graphs. BMC Bioinform. 21(1), 1–15 (2020)

    Article  Google Scholar 

  17. Sarker, B., Rtichie, D.W., Aridhi, S.: Exploiting complex protein domain networks for protein function annotation. In: Aiello, L.M., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L.M. (eds.) COMPLEX NETWORKS 2018. SCI, vol. 813, pp. 598–610. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05414-4_48

    Chapter  Google Scholar 

  18. Sellami, A., Tabbone, S.: Deep neural networks-based relevant latent representation learning for hyperspectral image classification. Pattern Recogn. 121, 108224 (2022)

    Article  Google Scholar 

  19. Singh, P., Singh, N.: Role of data mining techniques in bioinformatics. Int. J. Appl. Res. Bioinform. (IJARB) 11(1), 51–60 (2021)

    Article  Google Scholar 

  20. Veras, M.B., et al.: On the design of a similarity function for sparse binary data with application on protein function annotation. Knowl.-Based Syst. 238, 107863 (2022)

    Article  Google Scholar 

  21. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 32(1), 4–24 (2020)

    Article  Google Scholar 

  22. Zhang, C., Freddolino, P.L., Zhang, Y.: Cofactor: improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic Acids Res. 45(W1), W291–W299 (2017)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Zhang, J., Chen, Q., Liu, B.: Deepdrbp-2l: a new genome annotation predictor for identifying DNA binding proteins and RNA binding proteins using convolutional neural network and long short-term memory. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akrem Sellami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sellami, A., Sarker, B., Tabbone, S., Devignes, MD., Aridhi, S. (2022). A Semi-supervised Graph Deep Neural Network for Automatic Protein Function Annotation. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-07802-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-07801-9

  • Online ISBN: 978-3-031-07802-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics