Archives of Microbiology

, Volume 192, Issue 3, pp 151–155

An overview of in silico protein function prediction



As the protein databases continue to expand at an exponential rate, fed by daily uploads from multiple large scale genomic and metagenomic projects, the problem of assigning a function to each new protein has become the focus of significant research interest in recent times. Herein, we review the most recent advances in the field of automated function prediction (AFP). We begin by defining what is meant by biological “function” and the means of describing such functions using standardised machine readable ontologies. We then focus on the various function-prediction programs available, both sequence and structure based, and outline their associated strengths and weaknesses. Finally, we conclude with a brief overview of the future challenges and outstanding questions in the field, which still remain unanswered.


Protein function Homology-based transfer Ontologies Sequence and structure Motifs 


  1. Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402CrossRefPubMedGoogle Scholar
  2. Ashburner M, Lewis S (2002) On ontologies for biologists: the Gene Ontology–untangling the web. Novartis Found Symp 247: 66–80; discussion 80–63, 84–90, 244–252Google Scholar
  3. Attwood TK et al (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31:400–402CrossRefPubMedGoogle Scholar
  4. Bork P (2000) Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res 10:398–400CrossRefPubMedGoogle Scholar
  5. Breitkreutz BJ, Stark C, Tyers M (2003) The GRID: the general repository for interaction datasets. Genome Biol 4:R23CrossRefPubMedGoogle Scholar
  6. Di Gennaro JA et al (2001) Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol 134:232–245CrossRefPubMedGoogle Scholar
  7. Eisenberg D, Marcotte EM, Xenarios I, Yeates TO (2000) Protein function in the post-genomic era. Nature 405:823–826CrossRefPubMedGoogle Scholar
  8. Enault F, Suhre K, Claverie JM (2005) Phydbac “Gene Function Predictor”: a gene annotation tool based on genomic context analysis. BMC Bioinformatics 6:247CrossRefPubMedGoogle Scholar
  9. Friedberg I (2006) Automated protein function prediction–the genomic challenge. Brief Bioinform 7:225–242CrossRefPubMedGoogle Scholar
  10. Galperin MY, Walker DR, Koonin EV (1998) Analogous enzymes: independent inventions in enzyme evolution. Genome Res 8:779–790PubMedGoogle Scholar
  11. Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6:377–385CrossRefPubMedGoogle Scholar
  12. Gilks WR, Audit B, de Angelis D, Tsoka S, Ouzounis CA (2005) Percolation of annotation errors through hierarchically structured protein sequence databases. Math Biosci 193:223–234CrossRefPubMedGoogle Scholar
  13. Godzik A, Jambon M, Friedberg I (2007) Computational protein function prediction: are we making progress? Cell Mol Life Sci 64:2505–2511CrossRefPubMedGoogle Scholar
  14. Goldsmith-Fischman S, Honig B (2003) Structural genomics: computational methods for structure analysis. Protein Sci 12:1813–1821CrossRefPubMedGoogle Scholar
  15. Henikoff JG, Greene EA, Pietrokovski S, Henikoff S (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28:228–230CrossRefPubMedGoogle Scholar
  16. Hulo N et al (2008) The 20 years of PROSITE. Nucleic Acids Res 36:D245–D249CrossRefPubMedGoogle Scholar
  17. Jeffery CJ (2003) Moonlighting proteins: old proteins learning new tricks. Trends Genet 19:415–417CrossRefPubMedGoogle Scholar
  18. Jones S, Thornton JM (2004) Searching for functional sites in protein structures. Curr Opin Chem Biol 8:3–7CrossRefPubMedGoogle Scholar
  19. Laskowski RA, Watson JD, Thornton JM (2003) From protein structure to biochemical function? J Struct Funct Genomics 4:167–177CrossRefPubMedGoogle Scholar
  20. Lehne B, Schlitt T (2009) Protein-protein interaction databases: keeping up with growing interactomes. Hum Genomics 3:291–297PubMedGoogle Scholar
  21. Losko S, Heumann K (2009) Semantic data integration and knowledge management to represent biological network associations. Methods Mol Biol 563:241–258CrossRefPubMedGoogle Scholar
  22. Rost B (2002) Enzyme function less conserved than anticipated. J Mol Biol 318:595–608CrossRefPubMedGoogle Scholar
  23. Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (2003) Automatic prediction of protein function. Cell Mol Life Sci 60:2637–2650CrossRefPubMedGoogle Scholar
  24. Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3:88CrossRefPubMedGoogle Scholar
  25. Sleator RD, Shortall C, Hill C (2008) Metagenomics. Lett Appl Microbiol 47:361–366CrossRefPubMedGoogle Scholar
  26. Smith CL, Goldsmith CA, Eppig JT (2005) The mammalian phenotype ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6:R7CrossRefPubMedGoogle Scholar
  27. Taubig H, Buchner A, Griebsch J (2006) PAST: fast structure-based searching in the PDB. Nucleic Acids Res 34:W20–W23CrossRefPubMedGoogle Scholar
  28. Todd AE, Orengo CA, Thornton JM (2001) Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 307:1113–1143CrossRefPubMedGoogle Scholar
  29. Walker MG, Volkmuth W, Sprinzak E, Hodgson D, Klingler T (1999) Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res 9:1198–1203CrossRefPubMedGoogle Scholar
  30. Wallace AC, Laskowski RA, Thornton JM (1996) Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci 5:1001–1013PubMedCrossRefGoogle Scholar
  31. Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15:275–284CrossRefPubMedGoogle Scholar
  32. Ye Y, Godzik A (2004) FATCAT: a web server for flexible structure comparison and structure similarity searching. Nucleic Acids Res 32:W582–W585CrossRefPubMedGoogle Scholar
  33. Zhao XM, Chen L, Aihara K (2008) Protein function prediction with high-throughput data. Amino Acids 35:517–530CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Department of Biological SciencesCork Institute of TechnologyCorkIreland
  2. 2.Department of ComputingCork Institute of TechnologyCorkIreland
  3. 3.CIT Bioinformatics GroupCork Institute of TechnologyCorkIreland

Personalised recommendations