Abstract
Protein function prediction from sequence using the Gene Ontology (GO) classification is useful in many biological problems. It has recently attracted increasing interest, thanks in part to the Critical Assessment of Function Annotation (CAFA) challenge. In this paper, we introduce Guilty by Association on STRING (GAS), a tool to predict protein function exploiting protein–protein interaction networks without sequence similarity. The assumption is that whenever a protein interacts with other proteins, it is part of the same biological process and located in the same cellular compartment. GAS retrieves interaction partners of a query protein from the STRING database and measures enrichment of the associated functional annotations to generate a sorted list of putative functions. A performance evaluation based on CAFA metrics and a fair comparison with optimized BLAST similarity searches is provided. The consensus of GAS and BLAST is shown to improve overall performance. The PPI approach is shown to outperform similarity searches for biological process and cellular compartment GO predictions. Moreover, an analysis of the best practices to exploit protein–protein interaction networks is also provided.
Similar content being viewed by others
References
Altschul S (1990) Basic local alignment search tool. J Mol Biol 215:403–410. doi:10.1006/jmbi.1990.9999
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29. doi:10.1038/75556
Barabási A-L, Gulbahce N, Loscalzo J (2011) Network medicine: a network-based approach to human disease. Nat Rev Genet 12:56–68. doi:10.1038/nrg2918
Brun C, Chevenet F, Martin D et al (2003) Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol 5:R6. doi:10.1186/gb-2003-5-1-r6
Chua HN, Sung W-K, Wong L (2006) Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinforma Oxf Engl 22:1623–1630. doi:10.1093/bioinformatics/btl145
Clark WT, Radivojac P (2011) Analysis of protein function and its prediction from amino acid sequence. Proteins Struct Funct Bioinforma 79:2086–2096. doi:10.1002/prot.23029
Cozzetto D, Buchan DWA, Bryson K, Jones DT (2013) Protein function prediction by massive integration of evolutionary analyses and multiple data sources. BMC Bioinformatics 14:S1. doi:10.1186/1471-2105-14-S3-S1
Deng M, Zhang K, Mehta S et al (2003) Prediction of protein function using protein-protein interaction data. J Comput Biol J Comput Mol Cell Biol 10:947–960. doi:10.1089/106652703322756168
Di Domenico T, Potenza E, Walsh I et al (2014) RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res 42:D352–D357. doi:10.1093/nar/gkt1175
Dimmer EC, Huntley RP, Alam-Faruque Y et al (2011) The UniProt-GO annotation database in 2011. Nucleic Acids Res 40:D565–D570. doi:10.1093/nar/gkr1048
Engelhardt BE, Jordan MI, Srouji JR, Brenner SE (2011) Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res 21:1969–1980. doi:10.1101/gr.104687.109
Franceschini A, Szklarczyk D, Frankild S et al (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808–D815. doi:10.1093/nar/gks1094
Hishigaki H, Nakai K, Ono T et al (2001) Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast Chichester Engl 18:523–531. doi:10.1002/yea.706
Ho Y, Gruhler A, Heilbut A et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180–183. doi:10.1038/415180a
Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316:1497–1502. doi:10.1126/science.1141319
Minneci F, Piovesan D, Cozzetto D, Jones DT (2013) FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences. PLoS One 8:e63754. doi:10.1371/journal.pone.0063754
Nabieva E, Jim K, Agarwal A et al (2005) Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinforma Oxf Engl 21(Suppl 1):i302–i310. doi:10.1093/bioinformatics/bti1054
Pellegrini M, Marcotte EM, Thompson MJ et al (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci 96:4285–4288. doi:10.1073/pnas.96.8.4285
Piovesan D, Luigi Martelli P, Fariselli P et al (2011) BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Res. doi:10.1093/nar/gkr292
Piovesan D, Giollo M, Leonardi E et al (2015) INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity. Nucleic Acids Res 43:1–5. doi:10.1093/nar/gkv523
Potenza E, Di Domenico T, Walsh I, Tosatto SCE (2015) MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins. Nucleic Acids Res 43:D315–D320. doi:10.1093/nar/gku982
Radivojac P, Clark WT, Oron TR et al (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221–227. doi:10.1038/nmeth.2340
Suzek BE, Huang H, McGarvey P et al (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinforma Oxf Engl 23:1282–1288. doi:10.1093/bioinformatics/btm098
Zhu H, Snyder M (2003) Protein chip technology. Curr Opin Chem Biol 7:55–63. doi:10.1016/S1367-5931(02)00005-4
Acknowledgments
The authors are grateful to members of the BioComputing UP lab for insightful discussions. This project was funded by FIRB Futuro in Ricerca grant RBFR08ZSXY, University of Padua grant CPDR123473, and AIRC grant MFAG12740 to S.T. D.P. is funded by FIRC Fondazione Italiana per la Ricerca sul Cancro project no. 16621.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Piovesan, D., Giollo, M., Ferrari, C. et al. Protein function prediction using guilty by association from interaction networks. Amino Acids 47, 2583–2592 (2015). https://doi.org/10.1007/s00726-015-2049-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-015-2049-3