Abstract
The protein function annotation based on functional properties like the Enzyme Commission (EC) numbers is a very challenging task that aims to understand life at the molecular level. Especially, the size of features for each protein is very huge and the number of labeled samples is limited, which can significantly affect the annotation accuracy. To address these issues, we propose a novel semi-supervised graph deep learning model that aims to learn better latent representations for each protein/node by taking into account the neighborhood information in order to improve the annotation. Firstly, we extract a set of features from raw protein data. Each protein is associated with a 1-D feature vector that represents its InterPro domain composition. As D, the number of possible interPro domains, is very high (>11,000), we design a deep autoencoder model (DAE) that seeks to find an efficient representation of the domain composition of proteins in a lower dimensional latent space. Then, we construct a protein graph where each node is a protein associated with its latent representation vector and each edge is weighted by the Euclidean distance between the two nodes it connects. Finally, we train a semi-supervised graph neural network (SGNN) for the automatic protein function annotation using the constructed protein graph. Experiments are conducted on four reference proteomes in UniProtKB/SwissProt, including Human, Arabidopsis Thaliana, Mouse, and Rat. Experimental results show that the proposed model is competitive for protein function annotation compared to existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aridhi, S., Nguifo, E.M.: Big graph mining: frameworks and techniques. Big Data Res. 6, 1–10 (2016)
Cao, Y., Shen, Y.: Tale: transformer-based protein function annotation with joint sequence-label embedding. Bioinformatics 37(18), 2825–2833 (2021)
Consortium, U.: UniProt: a hub for protein information. Nucleic Acids Res. 43(D1), D204–D212 (2015)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural. Inf. Process. Syst. 29, 3844–3852 (2016)
Dohan, D., Gane, A., Bileschi, M.L., Belanger, D., Colwell, L.: Improving protein function annotation via unsupervised pre-training: robustness, efficiency, and insights. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2782–2791 (2021)
Gligorijević, V., et al.: Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 12(1), 1–14 (2021)
Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584 (2017)
Hanachi, R., Sellami, A., Farah, I.R.: Interpretation of human behavior from multi-modal brain MRI images based on graph deep neural networks and attention mechanism. In: VISIGRAPP (4: VISAPP), pp. 56–66 (2021)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Ko, C.W., Huh, J., Park, J.W.: Deep learning program to predict protein functions based on sequence information. MethodsX, p. 101622 (2022)
Leon, A., Pastor, O.: Towards a shared, conceptual model-based understanding of proteins and their interactions. IEEE Access 9, 73608–73623 (2021)
Li, Y., et al.: Deepre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 34(5), 760–769 (2018)
Ma, Y., Li, Q., Hu, N., Li, L.: Sebiograph: semi-supervised deep learning for the graph via sustainable knowledge transfer. Front. Neurorobot. 15, 32 (2021)
Saidi, R., Aridhi, S., Nguifo, E.M., Maddouri, M.: Feature extraction in protein sequences classification: a new stability measure. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp. 683–689 (2012)
Sarker, B., Khare, N., Devignes, M.-D., Aridhi, S.: Graph based automatic protein function annotation improved by semantic similarity. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds.) IWBBIO 2020. LNCS, vol. 12108, pp. 261–272. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45385-5_24
Sarker, B., Ritchie, D.W., Aridhi, S.: GrAPFI: predicting enzymatic function of proteins from domain similarity graphs. BMC Bioinform. 21(1), 1–15 (2020)
Sarker, B., Rtichie, D.W., Aridhi, S.: Exploiting complex protein domain networks for protein function annotation. In: Aiello, L.M., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L.M. (eds.) COMPLEX NETWORKS 2018. SCI, vol. 813, pp. 598–610. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05414-4_48
Sellami, A., Tabbone, S.: Deep neural networks-based relevant latent representation learning for hyperspectral image classification. Pattern Recogn. 121, 108224 (2022)
Singh, P., Singh, N.: Role of data mining techniques in bioinformatics. Int. J. Appl. Res. Bioinform. (IJARB) 11(1), 51–60 (2021)
Veras, M.B., et al.: On the design of a similarity function for sparse binary data with application on protein function annotation. Knowl.-Based Syst. 238, 107863 (2022)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Networks Learn. Syst. 32(1), 4–24 (2020)
Zhang, C., Freddolino, P.L., Zhang, Y.: Cofactor: improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic Acids Res. 45(W1), W291–W299 (2017)
Zhang, J., Chen, Q., Liu, B.: Deepdrbp-2l: a new genome annotation predictor for identifying DNA binding proteins and RNA binding proteins using convolutional neural network and long short-term memory. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Sellami, A., Sarker, B., Tabbone, S., Devignes, MD., Aridhi, S. (2022). A Semi-supervised Graph Deep Neural Network for Automatic Protein Function Annotation. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-07802-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07801-9
Online ISBN: 978-3-031-07802-6
eBook Packages: Computer ScienceComputer Science (R0)