Skip to main content
Log in

Protein Subcellular Localization Prediction Model Based on Graph Convolutional Network

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Protein subcellular localization prediction is an important research area in bioinformatics, which plays an essential role in understanding protein function and mechanism. Many machine learning and deep learning algorithms have been employed for this task, but most of them do not use structural information of proteins. With the advances in protein structure research in recent years, protein contact map prediction has been dramatically enhanced. In this paper, we present GraphLoc, a deep learning model that predicts the localization of proteins at the subcellular level. The cores of the model are a graph convolutional neural network module and a multi-head attention module. The protein topology graph is constructed based on a contact map predicted from protein sequences, which is used as the input of the GCN module to take full advantage of the structural information of proteins. Multi-head attention module learns the weighted contribution of different amino acids to subcellular localization in different feature representation subspaces. Experiments on the benchmark dataset show that the performance of our model is better than others. The code can be accessed at https://github.com/GoodGuy398/GraphLoc.

Graphical abstract

The proposed GraphLoc model consists of three parts. The first part is a graph convolutional network (GCN) module, which utilizes the predicted contact maps to construct protein graph, taking benefit of protein information accordingly. The second part is the multi-head attention module, which learns the weighted contribution of different amino acids in different feature representation subspace, and weighted average the feature map across all amino acid nodes. The last part is a fully connected layer that maps the flatten graph representation vector to another vector with a category number dimension, followed by a softmax layer to predict the protein subcellular localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Davis JR, Kakar M, Lim CS (2007) Controlling protein compartmentalization to overcome disease. Pharm Res 24(1):17–27. https://doi.org/10.1007/s11095-006-9133-z

    Article  CAS  PubMed  Google Scholar 

  2. Hung MC, Link W (2011) Protein localization in disease and therapy. J Cell Sci 124(20):3381–3392. https://doi.org/10.1242/jcs.089110

    Article  CAS  PubMed  Google Scholar 

  3. Walther TC, Mann M (2010) Mass spectrometry–based proteomics in cell biology. J Cell Biol 190(4):491–500. https://doi.org/10.1083/jcb.201004052

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Schubert W, Bonnekoh B, Pommer AJ, Philipsen L, Bockelmann R, Malykh Y, Gollnick H, Friedenberger M, Bode M, Dress AWM (2006) Analyzing proteome topology and function by automated multidimensional fluorescence microscopy. Nat Biotechnol 24(10):1270–1278. https://doi.org/10.1038/nbt1250

    Article  CAS  PubMed  Google Scholar 

  5. Imai K, Nakai K (2010) Prediction of subcellular locations of proteins: where to proceed? Proteomics 10(22):3970–3983. https://doi.org/10.1002/pmic.201000274

    Article  CAS  PubMed  Google Scholar 

  6. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29. https://doi.org/10.1038/75556

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kosugi S, Hasebe M, Tomita M, Yanagawa H (2008) Nuclear export signal consensus sequences defined using a localization-based yeast selection system. Traffic 9(12):2053–2062. https://doi.org/10.1111/j.1600-0854.2008.00825.x

    Article  CAS  PubMed  Google Scholar 

  8. Sperschneider J, Catanzariti AM, DeBoer K, Petre B, Gardiner DM, Singh KB, Dodds PN (2017) Taylor JM (2017) Localizer: subcellular localization prediction of both plant and effector proteins in the plant cell. Sci Rep 7(1):1–14. https://doi.org/10.1038/srep44598

    Article  Google Scholar 

  9. Blum T, Briesemeister S, Kohlbacher O (2009) Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10(1):1–11. https://doi.org/10.1186/1471-2105-10-274

    Article  CAS  Google Scholar 

  10. Briesemeister S, Rahnenfhrer J, Kohlbacher O (2010) Yloc—an interpretable web server for predicting subcellular localization. Nucleic Acids Res 38(suppl 2):W497–W502. https://doi.org/10.1093/nar/gkq477

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Collier CJA, Nakai K (2007) Wolf psort: protein localization predictor. Nucleic Acids Res 35(suppl 2):W585–W587. https://doi.org/10.1093/nar/gkm259

    Article  PubMed  PubMed Central  Google Scholar 

  12. Du L, Meng Q, Chen Y, Wu P (2020) Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and lda. BMC Bioinformatics 21(1):1–19. https://doi.org/10.1186/s12859-020-3539-1

    Article  CAS  Google Scholar 

  13. Yadav AK, Singla D (2020) Vacpred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques. J Biosci 45(1):1–9. https://doi.org/10.1007/s12038-020-00076-9

    Article  CAS  Google Scholar 

  14. Savojardo C, Martelli PL, Fariselli P, Casadio R (2015) Tppred3 detects and discriminates mitochondrial and chloroplastic targeting peptides in eukaryotic proteins. Bioinformatics 31(20):3269–3275. https://doi.org/10.1093/bioinformatics/btv367

    Article  CAS  PubMed  Google Scholar 

  15. Small I, Peeters N, Legeai F, Lurin C (2004) Predotar: a tool for rapidly screening proteomes for n-terminal targeting sequences. Proteomics 4(6):1581–1590. https://doi.org/10.1002/pmic.200300776

    Article  CAS  PubMed  Google Scholar 

  16. Emanuelsson O, Brunak S, Heijne GV, Nielsen H (2007) Locating proteins in the cell using targetp, signalp and related tools. Nat Protoc 2(4):953–971. https://doi.org/10.1038/nprot.2007.131

    Article  CAS  PubMed  Google Scholar 

  17. Armenteros JJA, Sønderby CK, Sønderby SK, Nielsen H, Winther O (2017) Deeploc: prediction of protein subcellular localization using deep learning. Bioinformatics 33(21):3387–3395. https://doi.org/10.1093/bioinformatics/btx431

    Article  CAS  Google Scholar 

  18. Long W, Yang Y, Shen HB (2020) Imploc: a multi-instance deep learning model for the prediction of protein subcellular localization based on immunohistochemistry images. Bioinformatics 36(7):2244–2250. https://doi.org/10.1093/bioinformatics/btz909

    Article  CAS  PubMed  Google Scholar 

  19. Jiang Y, Wang D, Yao Y, Eubel H, Kunzler P, Møller IM, Xu D (2021) Mulocdeep: a deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation. Comput Struct Biotechnol J 19:4825–4839. https://doi.org/10.1016/j.csbj.2021.08.027

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Cong H, Liu H, Cao Y, Chen Y, Liang C (2022) Multiple protein subcellular locations prediction based on deep convolutional neural networks with self-attention mechanism. Interdiscip Sci Comput Life Sci 1–18. https://doi.org/10.1007/s12539-021-00496-7

  21. Gligorijevic V, Barot M, Bonneau R (2018) deepnf: deep network fusion for protein function prediction. Bioinformatics 34(22):3873–3881. https://doi.org/10.1093/bioinformatics/bty440

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AMJ (2018) Assessment of contact predictions in casp12: co-evolution and deep learning coming of age. Proteins Struct Funct Bioinf 86:51–66. https://doi.org/10.1002/prot.25407

  23. Hanson J, Paliwal K, Litfin T, Yang Y, Zhou Y (2018) Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 34(23):4039–4045. https://doi.org/10.1093/bioinformatics/bty481

    Article  CAS  PubMed  Google Scholar 

  24. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907. https://doi.org/10.48550/arXiv.1609.02907

  25. Mount DA (2008) Using blosum in sequence alignments. Cold Spring Harbor Protocols (6):pdb–top39. https://doi.org/10.1101/pdb.top39

  26. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. https://doi.org/10.1093/bioinformatics/btl158

    Article  CAS  PubMed  Google Scholar 

  27. Bairoch A, Apweiler R (2000) The swiss-prot protein sequence database and its supplement trembl in 2000. Nucleic Acids Res 28(1):45–48. https://doi.org/10.1093/nar/28.1.45

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Venkatarajan MS, Braun W (2001) New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Mol Model Annu 7(12):445–453. https://doi.org/10.1007/s00894-001-0058-5

    Article  CAS  Google Scholar 

  29. Emerson IS, Amala A (2017) Protein contact maps: a binary depiction of protein 3d structures. Phys A 465:782–791. https://doi.org/10.1016/j.physa.2016.08.033

    Article  CAS  Google Scholar 

  30. Zheng S, Yan X, Yang Y, Xu J (2019) Identifying structure–property relationships through smiles syntax analysis with self-attention mechanism. J Chem Inf Model 59(2):914–923. https://doi.org/10.1021/acs.jcim.8b00803

    Article  CAS  PubMed  Google Scholar 

  31. Lin Z, Feng M, Nogueira dos Santos C, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130. https://doi.org/10.48550/arXiv.1703.03130

  32. Yang H, Wang M, Yu Z, Zhao XM, Li A (2020) Gancon: Protein contact map prediction with deep generative adversarial network. IEEE Access 8:80899–80907. https://doi.org/10.1109/ACCESS.2020.2991605

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China (61972174), Key research and development project of Jilin Provincial Science and Technology Department (20210201080GX), Jilin Province Development and Reform Commission (2021C044-1), Guangdong science and technology planning (2020A0505100018), Guangdong universities’ innovation team (2021KCXTDO15), and Guangdong key disciplines (2021ZDJS138) projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohu Shi.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Gu, J., Wang, Z. et al. Protein Subcellular Localization Prediction Model Based on Graph Convolutional Network. Interdiscip Sci Comput Life Sci 14, 937–946 (2022). https://doi.org/10.1007/s12539-022-00529-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-022-00529-9

Keywords

Navigation