Amino Acids

, Volume 47, Issue 12, pp 2583–2592 | Cite as

Protein function prediction using guilty by association from interaction networks

  • Damiano Piovesan
  • Manuel Giollo
  • Carlo Ferrari
  • Silvio C. E. TosattoEmail author
Original Article


Protein function prediction from sequence using the Gene Ontology (GO) classification is useful in many biological problems. It has recently attracted increasing interest, thanks in part to the Critical Assessment of Function Annotation (CAFA) challenge. In this paper, we introduce Guilty by Association on STRING (GAS), a tool to predict protein function exploiting protein–protein interaction networks without sequence similarity. The assumption is that whenever a protein interacts with other proteins, it is part of the same biological process and located in the same cellular compartment. GAS retrieves interaction partners of a query protein from the STRING database and measures enrichment of the associated functional annotations to generate a sorted list of putative functions. A performance evaluation based on CAFA metrics and a fair comparison with optimized BLAST similarity searches is provided. The consensus of GAS and BLAST is shown to improve overall performance. The PPI approach is shown to outperform similarity searches for biological process and cellular compartment GO predictions. Moreover, an analysis of the best practices to exploit protein–protein interaction networks is also provided.


Protein function Protein interaction network Gene ontology CAFA Protein sequence 



The authors are grateful to members of the BioComputing UP lab for insightful discussions. This project was funded by FIRB Futuro in Ricerca grant RBFR08ZSXY, University of Padua grant CPDR123473, and AIRC grant MFAG12740 to S.T. D.P. is funded by FIRC Fondazione Italiana per la Ricerca sul Cancro project no. 16621.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. Altschul S (1990) Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1006/jmbi.1990.9999 CrossRefPubMedGoogle Scholar
  2. Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29. doi: 10.1038/75556 PubMedCentralCrossRefPubMedGoogle Scholar
  3. Barabási A-L, Gulbahce N, Loscalzo J (2011) Network medicine: a network-based approach to human disease. Nat Rev Genet 12:56–68. doi: 10.1038/nrg2918 PubMedCentralCrossRefPubMedGoogle Scholar
  4. Brun C, Chevenet F, Martin D et al (2003) Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol 5:R6. doi: 10.1186/gb-2003-5-1-r6 PubMedCentralCrossRefPubMedGoogle Scholar
  5. Chua HN, Sung W-K, Wong L (2006) Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinforma Oxf Engl 22:1623–1630. doi: 10.1093/bioinformatics/btl145 CrossRefGoogle Scholar
  6. Clark WT, Radivojac P (2011) Analysis of protein function and its prediction from amino acid sequence. Proteins Struct Funct Bioinforma 79:2086–2096. doi: 10.1002/prot.23029 CrossRefGoogle Scholar
  7. Cozzetto D, Buchan DWA, Bryson K, Jones DT (2013) Protein function prediction by massive integration of evolutionary analyses and multiple data sources. BMC Bioinformatics 14:S1. doi: 10.1186/1471-2105-14-S3-S1 PubMedCentralCrossRefPubMedGoogle Scholar
  8. Deng M, Zhang K, Mehta S et al (2003) Prediction of protein function using protein-protein interaction data. J Comput Biol J Comput Mol Cell Biol 10:947–960. doi: 10.1089/106652703322756168 CrossRefGoogle Scholar
  9. Di Domenico T, Potenza E, Walsh I et al (2014) RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res 42:D352–D357. doi: 10.1093/nar/gkt1175 PubMedCentralCrossRefPubMedGoogle Scholar
  10. Dimmer EC, Huntley RP, Alam-Faruque Y et al (2011) The UniProt-GO annotation database in 2011. Nucleic Acids Res 40:D565–D570. doi: 10.1093/nar/gkr1048 PubMedCentralCrossRefPubMedGoogle Scholar
  11. Engelhardt BE, Jordan MI, Srouji JR, Brenner SE (2011) Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res 21:1969–1980. doi: 10.1101/gr.104687.109 PubMedCentralCrossRefPubMedGoogle Scholar
  12. Franceschini A, Szklarczyk D, Frankild S et al (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808–D815. doi: 10.1093/nar/gks1094 PubMedCentralCrossRefPubMedGoogle Scholar
  13. Hishigaki H, Nakai K, Ono T et al (2001) Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast Chichester Engl 18:523–531. doi: 10.1002/yea.706 CrossRefGoogle Scholar
  14. Ho Y, Gruhler A, Heilbut A et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180–183. doi: 10.1038/415180a CrossRefPubMedGoogle Scholar
  15. Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316:1497–1502. doi: 10.1126/science.1141319 CrossRefPubMedGoogle Scholar
  16. Minneci F, Piovesan D, Cozzetto D, Jones DT (2013) FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences. PLoS One 8:e63754. doi: 10.1371/journal.pone.0063754 PubMedCentralCrossRefPubMedGoogle Scholar
  17. Nabieva E, Jim K, Agarwal A et al (2005) Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinforma Oxf Engl 21(Suppl 1):i302–i310. doi: 10.1093/bioinformatics/bti1054 CrossRefGoogle Scholar
  18. Pellegrini M, Marcotte EM, Thompson MJ et al (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci 96:4285–4288. doi: 10.1073/pnas.96.8.4285 PubMedCentralCrossRefPubMedGoogle Scholar
  19. Piovesan D, Luigi Martelli P, Fariselli P et al (2011) BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Res. doi: 10.1093/nar/gkr292 PubMedCentralPubMedGoogle Scholar
  20. Piovesan D, Giollo M, Leonardi E et al (2015) INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity. Nucleic Acids Res 43:1–5. doi: 10.1093/nar/gkv523 CrossRefGoogle Scholar
  21. Potenza E, Di Domenico T, Walsh I, Tosatto SCE (2015) MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins. Nucleic Acids Res 43:D315–D320. doi: 10.1093/nar/gku982 PubMedCentralCrossRefPubMedGoogle Scholar
  22. Radivojac P, Clark WT, Oron TR et al (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221–227. doi: 10.1038/nmeth.2340 PubMedCentralCrossRefPubMedGoogle Scholar
  23. Suzek BE, Huang H, McGarvey P et al (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinforma Oxf Engl 23:1282–1288. doi: 10.1093/bioinformatics/btm098 CrossRefGoogle Scholar
  24. Zhu H, Snyder M (2003) Protein chip technology. Curr Opin Chem Biol 7:55–63. doi: 10.1016/S1367-5931(02)00005-4 CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  • Damiano Piovesan
    • 1
  • Manuel Giollo
    • 1
    • 2
  • Carlo Ferrari
    • 2
  • Silvio C. E. Tosatto
    • 1
    • 3
    Email author
  1. 1.Department of Biomedical SciencesUniversity of PaduaPaduaItaly
  2. 2.Department of Information EngineeringUniversity of PaduaPaduaItaly
  3. 3.CNR Institute of NeurosciencePaduaItaly

Personalised recommendations