Skip to main content

Protein Function Prediction

  • Protocol
  • First Online:
Functional Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1654))

Abstract

Protein function is a concept that can have different interpretations in different biological contexts, and the number and diversity of novel proteins identified by large-scale “omics” technologies poses increasingly new challenges. In this review we explore current strategies used to predict protein function focused on high-throughput sequence analysis, as for example, inference based on sequence similarity, sequence composition, structure, and protein–protein interaction. Various prediction strategies are discussed together with illustrative workflows highlighting the use of some benchmark tools and knowledge bases in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Galperin MY, Makarova KS, Wolf YI, Koonin EV (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43:D261–D269. doi:10.1093/nar/gku1223

    Article  CAS  PubMed  Google Scholar 

  2. Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15:275–284. doi:10.1016/j.sbi.2005.04.003

    Article  CAS  PubMed  Google Scholar 

  3. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DWA, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJE, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YAI, van Dijk ADJ, ter Braak CJF, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, Friedberg I (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221–227. doi:10.1038/nmeth.2340

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Clark WT, Radivojac P (2011) Analysis of protein function and its prediction from amino acid sequence. Proteins Struct Funct Bioinforma 79:2086–2096. doi:10.1002/prot.23029

    Article  CAS  Google Scholar 

  5. Sahraeian SM, Luo KR, Brenner SE (2015) SIFTER search: a web server for accurate phylogeny-based protein function prediction. Nucleic Acids Res 43:W141–W147. doi:10.1093/nar/gkv461

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Galperin MY, Koonin EV (2010) From complete genome sequence to “complete” understanding? Trends Biotechnol 28:398–406. doi:10.1016/j.tibtech.2010.05.006

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (1998) Predicting function: from genes to genomes and back. J Mol Biol 283:707–725

    Article  CAS  PubMed  Google Scholar 

  8. Punta M, Ofran Y (2008) The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 4:e1000160

    Article  PubMed  PubMed Central  Google Scholar 

  9. Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94

    Article  CAS  PubMed  Google Scholar 

  10. Sleator RD (2012) Prediction of protein functions. In: Kaufmann M, Klinger C (eds) Functional genomics. Springer, New York, NY, pp 15–24

    Chapter  Google Scholar 

  11. Sleator RD, Walsh P (2010) An overview of in silico protein function prediction. Arch Microbiol 192:151–155. doi:10.1007/s00203-010-0549-9

    Article  CAS  PubMed  Google Scholar 

  12. Friedberg I (2006) Automated protein function prediction – the genomic challenge. Brief Bioinform 7:225–242. doi:10.1093/bib/bbl004

    Article  CAS  PubMed  Google Scholar 

  13. Lee D, Redfern O, Orengo C (2007) Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 8:995–1005. doi:10.1038/nrm2281

    Article  CAS  PubMed  Google Scholar 

  14. Khan I, Chen Y, Dong T, Hong X, Takeuchi R, Mori H, Kihara D (2014) Genome-scale identification and characterization of moonlighting proteins. Biol Direct. doi:10.1186/s13062-014-0030-9

  15. Jeffery CJ (1999) Moonlighting proteins. Trends Biochem Sci 24:8–11

    Article  CAS  PubMed  Google Scholar 

  16. Hu P, Janga SC, Babu M, Díaz-Mejía JJ, Butland G, Yang W, Pogoutse O, Guo X, Phanse S, Wong P, Chandran S, Christopoulos C, Nazarians-Armavil A, Nasseri NK, Musso G, Ali M, Nazemof N, Eroukova V, Golshani A, Paccanaro A, Greenblatt JF, Moreno-Hagelsieb G, Emili A (2009) Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 7:e1000096. doi:10.1371/journal.pbio.1000096

    Article  PubMed Central  Google Scholar 

  17. Madupu R, Richter A, Dodson RJ, Brinkac L, Harkins D, Durkin S, Shrivastava S, Sutton G, Haft D (2012) CharProtDB: a database of experimentally characterized protein annotations. Nucleic Acids Res 40:D237–D241. doi:10.1093/nar/gkr1133

    Article  CAS  PubMed  Google Scholar 

  18. Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, Bateman A, Eddy SR (2015) HMMER web server: 2015 update. Nucleic Acids Res 43:W30–W38. doi:10.1093/nar/gkv397

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Goodacre NF, Gerloff DL, Uetz P (2014) Protein domains of unknown function are essential in bacteria. mBio 5:e00744-13. doi:10.1128/mBio.00744-13

    Article  Google Scholar 

  20. Bateman A, Coggill P, Finn RD (2010) DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun 66:1148–1152. doi:10.1107/S1744309110001685

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Theißen G (2002) Orthology: secret life of genes. Nature 415:741–741. doi:10.1038/415741a

    Article  PubMed  Google Scholar 

  22. Zakon HH (2002) Convergent evolution on the molecular level. Brain Behav Evol 59:250–261

    Article  PubMed  Google Scholar 

  23. Doolittle RF (1994) Convergent evolution: the need to be explicit. Trends Biochem Sci 19:15–18. doi:10.1016/0968-0004(94)90167-8

    Article  CAS  PubMed  Google Scholar 

  24. Vazquez A, Flammini A, Maritan A, Vespignani A (2003) Global protein function prediction from protein–protein interaction networks. Nat Biotechnol 21:697–700

    Article  CAS  PubMed  Google Scholar 

  25. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi:10.1186/1471-2105-10-421

    Article  PubMed  PubMed Central  Google Scholar 

  26. Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (2003) Automatic prediction of protein function. Cell Mol Life Sci CMLS 60:2637–2650. doi:10.1007/s00018-003-3114-8

    Article  CAS  PubMed  Google Scholar 

  27. Engelhardt BE, Jordan MI, Srouji JR, Brenner SE (2011) Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res 21:1969–1980. doi:10.1101/gr.104687.109

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L, Xenarios I (2013) New and continuing developments at PROSITE. Nucleic Acids Res 41:D344–D347. doi:10.1093/nar/gks1067

    Article  CAS  PubMed  Google Scholar 

  29. Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Bryant SH (2015) CDD: NCBI’s conserved domain database. Nucleic Acids Res 43:D222–D226. doi:10.1093/nar/gku1221

    Article  CAS  PubMed  Google Scholar 

  30. Attwood TK, Coletta A, Muirhead G, Pavlopoulou A, Philippou PB, Popov I, Roma-Mateo C, Theodosiou A, Mitchell AL (2012) The PRINTS database: a fine-grained protein sequence annotation and analysis resource – its status in 2012. Database 2012:bas019. doi:10.1093/database/bas019

    Article  PubMed  PubMed Central  Google Scholar 

  31. Hawkins T, Kihara D (2007) Function prediction of uncharacterized proteins. J Bioinforma Comput Biol 5:1–30

    Article  CAS  Google Scholar 

  32. Sillitoe I, Lewis TE, Cuff A, Das S, Ashford P, Dawson NL, Furnham N, Laskowski RA, Lee D, Lees JG, Lehtinen S, Studer RA, Thornton J, Orengo CA (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43:D376–D381. doi:10.1093/nar/gku947

    Article  CAS  PubMed  Google Scholar 

  33. Lam SD, Dawson NL, Das S, Sillitoe I, Ashford P, Lee D, Lehtinen S, Orengo CA, Lees JG (2016) Gene3D: expanding the utility of domain assignments. Nucleic Acids Res 44:D404–D409. doi:10.1093/nar/gkv1231

    Article  CAS  PubMed  Google Scholar 

  34. Yeats C, Lees J, Carter P, Sillitoe I, Orengo C (2011) The Gene3D web services: a platform for identifying, annotating and comparing structural domains in protein sequences. Nucleic Acids Res 39:W546–W550. doi:10.1093/nar/gkr438

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Holm L, Rosenstrom P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res 38:W545–W549. doi:10.1093/nar/gkq366

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6:377–385

    Article  CAS  PubMed  Google Scholar 

  37. Shindyalov IN, Bourne PE (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11:739–747

    Article  CAS  PubMed  Google Scholar 

  38. Goldberg T, Hecht M, Hamp T, Karl T, Yachdav G, Ahmed N, Altermann U, Angerer P, Ansorge S, Balasz K, Bernhofer M, Betz A, Cizmadija L, Do KT, Gerke J, Greil R, Joerdens V, Hastreiter M, Hembach K, Herzog M, Kalemanov M, Kluge M, Meier A, Nasir H, Neumaier U, Prade V, Reeb J, Sorokoumov A, Troshani I, Vorberg S, Waldraff S, Zierer J, Nielsen H, Rost B (2014) LocTree3 prediction of localization. Nucleic Acids Res 42:W350–W355. doi:10.1093/nar/gku396

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Pierleoni A, Martelli PL, Fariselli P, Casadio R (2006) BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22:e408–e416. doi:10.1093/bioinformatics/btl222

    Article  CAS  PubMed  Google Scholar 

  40. Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016. doi:10.1006/jmbi.2000.3903

    Article  CAS  PubMed  Google Scholar 

  41. Boden M, Hawkins J (2005) Prediction of subcellular localization using sequence-biased recurrent networks. Bioinformatics 21:2279–2286. doi:10.1093/bioinformatics/bti372

    Article  CAS  PubMed  Google Scholar 

  42. Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden markov model: application to complete genomes 11 Edited by F. Cohen. J Mol Biol 305:567–580. doi:10.1006/jmbi.2000.4315

    Article  CAS  PubMed  Google Scholar 

  43. Juncker AS, Willenbrock H, von Heijne G, Brunak S, Nielsen H, Krogh A (2003) Prediction of lipoprotein signal peptides in gram-negative bacteria. Protein Sci 12:1652–1662. doi:10.1110/ps.0303703

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Bendtsen JD, Nielsen H, Widdick D, Palmer T, Brunak S (2005) Prediction of twin-arginine signal peptides. BMC Bioinformatics 6:167

    Article  PubMed  PubMed Central  Google Scholar 

  45. du Plessis L, Skunca N, Dessimoz C (2011) The what, where, how and why of gene ontology – a primer for bioinformaticians. Brief Bioinform 12:723–735. doi:10.1093/bib/bbr002

    Article  PubMed  PubMed Central  Google Scholar 

  46. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Lesk AM (2010) Introduction to protein science: architecture, function, and genomics, 2nd edn. Oxford University Press, Oxford

    Google Scholar 

  48. Saier MH (2006) TCDB: the transporter classification database for membrane transport protein analyses and information. Nucleic Acids Res 34:D181–D186. doi:10.1093/nar/gkj001

    Article  CAS  PubMed  Google Scholar 

  49. Huang DW, Sherman BT, Lempicki RA (2008) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57. doi:10.1038/nprot.2008.211

    Article  Google Scholar 

  50. Martin DM, Berriman M, Barton GJ (2004) GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics 5:178. doi:10.1186/1471-2105-5-178

    Article  PubMed  PubMed Central  Google Scholar 

  51. Hawkins T, Luban S, Kihara D (2006) Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci 15:1550–1556. doi:10.1110/ps.062153506

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Wass MN, Barton G, Sternberg MJE (2012) CombFunc: predicting protein function using heterogeneous data sources. Nucleic Acids Res 40:W466–W470. doi:10.1093/nar/gks489

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Stærfeldt HH, Rapacki K, Workman C, Andersen CAF, Knudsen S, Krogh A, Valencia A, Brunak S (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319:1257–1265. doi:10.1016/S0022-2836(02)00379-0

    Article  CAS  PubMed  Google Scholar 

  54. Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, Jensen LJ, von Mering C, Bork P (2016) eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293. doi:10.1093/nar/gkv1248

    Article  CAS  PubMed  Google Scholar 

  55. Mi H (2004) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res 33:D284–D288. doi:10.1093/nar/gki078

    Article  PubMed Central  Google Scholar 

  56. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. doi:10.1093/nar/gkt1178

    Article  CAS  PubMed  Google Scholar 

  57. Wagner GP, Pavlicev M, Cheverud JM (2007) The road to modularity. Nat Rev Genet 8:921–931. doi:10.1038/nrg2267

    Article  CAS  PubMed  Google Scholar 

  58. Pereira-Leal JB, Levy ED, Teichmann SA (2006) The origins and evolution of functional modules: lessons from protein complexes. Philos Trans R Soc B Biol Sci 361:507–517. doi:10.1098/rstb.2005.1807

    Article  CAS  Google Scholar 

  59. Osterman A, Overbeek R (2003) Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol 7:238–251. doi:10.1016/S1367-5931(03)00027-9

    Article  CAS  PubMed  Google Scholar 

  60. Kensche PR, van Noort V, Dutilh BE, Huynen MA (2008) Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface 5:151–170. doi:10.1098/rsif.2007.1047

    Article  CAS  PubMed  Google Scholar 

  61. Eisen JA (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8:163–167. doi:10.1101/gr.8.3.163

    Article  CAS  PubMed  Google Scholar 

  62. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44:D457–D462. doi:10.1093/nar/gkv1070

    Article  CAS  PubMed  Google Scholar 

  63. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C (2015) STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452. doi:10.1093/nar/gku1003

    Article  CAS  PubMed  Google Scholar 

  64. Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T (2001) Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast 18:523–531

    Article  CAS  PubMed  Google Scholar 

  65. Mayer ML, Hieter P (2000) Protein networks—built by association. Nat Biotechnol 18:1242–1243. doi:10.1038/82342

    Article  CAS  PubMed  Google Scholar 

  66. Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol. doi:10.1038/msb4100129

  67. Engelhardt BE, Jordan MI, Muratore KE, Brenner SE (2005) Protein molecular function prediction by Bayesian Phylogenomics. PLoS Comput Biol 1:e45. doi:10.1371/journal.pcbi.0010045

    Article  PubMed  PubMed Central  Google Scholar 

  68. Rodrigues BN, Steffens MBR, Raittz RT, Santos-Weiss ICR, Marchaukoski JN (2015) Quantitative assessment of protein function prediction programs. Genet Mol Res 14:17555–17566. doi:10.4238/2015.December.21.28

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Magalhães Cruz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Cruz, L.M., Trefflich, S., Weiss, V.A., Castro, M.A.A. (2017). Protein Function Prediction. In: Kaufmann, M., Klinger, C., Savelsbergh, A. (eds) Functional Genomics. Methods in Molecular Biology, vol 1654. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7231-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7231-9_5

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7230-2

  • Online ISBN: 978-1-4939-7231-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics