Advertisement

Exploiting Complex Protein Domain Networks for Protein Function Annotation

  • Bishnu Sarker
  • David W. Rtichie
  • Sabeur Aridhi
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 813)

Abstract

Huge numbers of protein sequences are now available in public databases. In order to exploit more fully this valuable biological data, these sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. In the March 2018 release of UniProtKB, some 556,000 sequences have been manually curated but over 111 million sequences still lack functional annotations. The ability to annotate automatically these unannotated sequences would represent a major advance for the field of bioinformatics. Here, we present a novel network-based approach called GrAPFI for the automatic functional annotation of protein sequences. The underlying assumption of GrAPFI is that proteins may be related to each other by the protein domains, families, and super-families that they share. Several protein domain databases exist such as InterPro, Pfam, SMART, CDD, Gene3D, and Prosite, for example. Our approach uses Interpro domains, because the InterPro database contains information from several other major protein family and domain databases. Our results show that GrAPFI achieves better EC number annotation performance than several other previously described approaches.

Keywords

Complex protein domain networks Protein function annotation Label propagation GrAPFI Bioinformatics 

Notes

Acknowledgements

This work was partially supported by the CNRS-INRIA/FAPs project “TempoGraphs” (PRC2243). Bishnu Sarker is a doctoral student funded by an INRIA CORDI-S contract.

References

  1. 1.
    Altschul, S.F., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucl. Acids Res. 25(17), 3389–3402 (1997).  https://doi.org/10.1093/nar/25.17.3389CrossRefGoogle Scholar
  2. 2.
    Aridhi, S., Montresor, A., Velegrakis, Y.: Bladyg: a graph processing framework for large dynamic graphs. Big Data Res. 9, 9–17 (2017)CrossRefGoogle Scholar
  3. 3.
    Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6(4), 262–274 (2009)Google Scholar
  4. 4.
    Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22(13), 1623–1630 (2006)CrossRefGoogle Scholar
  5. 5.
    Consortium, T.U.: Uniprot: a hub for protein information. Nucl. Acids Res. 43(D204–D212) (2015).  https://doi.org/10.1093/nar/gku989. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384041/
  6. 6.
    Cornish-Bowden, A.: Current iubmb recommendations on enzyme nomenclature and kinetics. Perspect. Sci. 1(1–6), 74–87 (2014)CrossRefGoogle Scholar
  7. 7.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  8. 8.
    Dobson, P.D., Doig, A.J.: Predicting enzyme class from protein structure without alignments. J. Mol. Biol. 345(1), 187–199 (2005)CrossRefGoogle Scholar
  9. 9.
    Finn, R.D., Clements, J., Eddy, S.R.: Hmmer web server: interactive sequence similarity searching. Nucl. Acids Res. 39(2), W29–W37 (2011).  https://doi.org/10.1093/nar/gkr367
  10. 10.
    Gattiker, A., et al.: Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27(1), 49–58 (2003).  https://doi.org/10.1016/s1476-9271(02)00094-4CrossRefGoogle Scholar
  11. 11.
    Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18(6), 523–531 (2001)CrossRefGoogle Scholar
  12. 12.
    Huang, W.L., Chen, H.M., Hwang, S.F., Ho, S.Y.: Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosystems 90(2), 405–413 (2007)CrossRefGoogle Scholar
  13. 13.
    des Jardins, M., Karp, P.D., Krummenacker, M., Lee, T.J., Ouzounis, C.A.: Prediction of enzyme classification from protein sequence without the use of sequence similarity. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 92–99 (1997)Google Scholar
  14. 14.
    Jones, P., et al.: Interproscan 5: genome-scale protein function classification. Bioinformatics 30(9), 1236–1240 (2014)CrossRefGoogle Scholar
  15. 15.
    Kretschmann, E., Fleischmann, W., Apweiler, R.: Automatic rule generation for protein annotation with the c4.5 data mining algorithm applied on swiss-prot. Bioinformatics 17(10), 920–6 (2001)CrossRefGoogle Scholar
  16. 16.
    Kumar, N., Skolnick, J.: Eficaz2. 5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics 28(20), 2687–2688 (2012)Google Scholar
  17. 17.
    Kummerfeld, S.K., Teichmann, S.A.: Protein domain organisation: adding order. BMC Bioinform. 10(1), 39 (2009)CrossRefGoogle Scholar
  18. 18.
    Li, Y., et al.: Deepre: sequence-based enzyme ec number prediction by deep learning. Bioinformatics 34(5), 760–769 (2018).  https://doi.org/10.1093/bioinformatics/btx680CrossRefGoogle Scholar
  19. 19.
    Li, Y.H., et al.: Svm-prot 2016: a web-server for machine learning prediction of protein functional families from sequence irrespective of similarity. PloS One 11(8) (2016)Google Scholar
  20. 20.
    Lu, L., Qian, Z., Cai, Y.D., Li, Y.: Ecs: an automatic enzyme classifier based on functional domain composition. Comput. Biol. Chem. 31(3), 226–232 (2007)CrossRefGoogle Scholar
  21. 21.
    Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl\(\_\)1), i302–i310 (2005)Google Scholar
  22. 22.
    Nagao Chioko, N.N., Kenji, M.: Prediction of detailed enzyme functions and identification of specificity determining residues by random forests. PloS One 9(1) (2014)Google Scholar
  23. 23.
    Nasibov, E., Kandemir-Cavas, C.: Efficiency analysis of knn and minimum distance-based classifiers in enzyme family prediction. Comput. Biol. Chem. 33(6), 461–464 (2009)CrossRefGoogle Scholar
  24. 24.
    Quester, S., Schomburg, D.: Enzymedetector: an integrated enzyme function prediction tool and database. BMC Bioinform. 12(1), 376 (2011)CrossRefGoogle Scholar
  25. 25.
    Quevillon, E., et al.: Interproscan: protein domains identifier. Nucl. Acids Res. 33(suppl\(\_\)2), W116–W120 (2005)Google Scholar
  26. 26.
    Rahman, S.A., Cuesta, S.M., Furnham, N., Holliday, G.L., Thornton, J.M.: Ec-blast: a tool to automatically search and compare enzyme reactions. Nat. Methods 11(2), 171 (2014)CrossRefGoogle Scholar
  27. 27.
    Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18(12), 1257 (2000)CrossRefGoogle Scholar
  28. 28.
    Shen, H.B., Chou, K.C.: Ezypred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem. Biophys. Res. Commun. 364(1), 53–59 (2007)CrossRefGoogle Scholar
  29. 29.
    Volpato, V., Adelfio, A., Pollastri, G.: Accurate prediction of protein enzymatic class by n-to-1 neural networks. BMC Bioinform. 14(1), S11 (2013)CrossRefGoogle Scholar
  30. 30.
    Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., Zhang, Y.: The i-tasser suite: protein structure and function prediction. Nat. Methods 12(1), 7 (2015)CrossRefGoogle Scholar
  31. 31.
    Yu, C., Zavaljevski, N., Desai, V., Reifman, J.: Genome-wide enzyme annotation with precision control: catalytic families (catfam) databases. Proteins: Struct. Funct. Bioinform. 74(2), 449–460 (2009)CrossRefGoogle Scholar
  32. 32.
    Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRefGoogle Scholar
  33. 33.
    Zhang, C., Freddolino, P.L., Zhang, Y.: Cofactor: improved protein function prediction by combining structure, sequence and proteinprotein interaction information. Nucl. Acids Res. 45(1), 291–299 (2017)CrossRefGoogle Scholar
  34. 34.
    Zhao, B., Hu, S., Li, X., Zhang, F., Tian, Q., Ni, W.: An efficient method for protein function annotation based on multilayer protein networks. Hum. Genomics 10(1), 33 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Bishnu Sarker
    • 1
  • David W. Rtichie
    • 1
  • Sabeur Aridhi
    • 1
  1. 1.University of Lorraine, CNRS, Inria, LORIANancyFrance

Personalised recommendations