Skip to main content
Log in

An overview of in silico protein function prediction

  • Mini-Review
  • Published:
Archives of Microbiology Aims and scope Submit manuscript

Abstract

As the protein databases continue to expand at an exponential rate, fed by daily uploads from multiple large scale genomic and metagenomic projects, the problem of assigning a function to each new protein has become the focus of significant research interest in recent times. Herein, we review the most recent advances in the field of automated function prediction (AFP). We begin by defining what is meant by biological “function” and the means of describing such functions using standardised machine readable ontologies. We then focus on the various function-prediction programs available, both sequence and structure based, and outline their associated strengths and weaknesses. Finally, we conclude with a brief overview of the future challenges and outstanding questions in the field, which still remain unanswered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  CAS  PubMed  Google Scholar 

  • Ashburner M, Lewis S (2002) On ontologies for biologists: the Gene Ontology–untangling the web. Novartis Found Symp 247: 66–80; discussion 80–63, 84–90, 244–252

    Google Scholar 

  • Attwood TK et al (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 31:400–402

    Article  CAS  PubMed  Google Scholar 

  • Bork P (2000) Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res 10:398–400

    Article  CAS  PubMed  Google Scholar 

  • Breitkreutz BJ, Stark C, Tyers M (2003) The GRID: the general repository for interaction datasets. Genome Biol 4:R23

    Article  PubMed  Google Scholar 

  • Di Gennaro JA et al (2001) Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol 134:232–245

    Article  CAS  PubMed  Google Scholar 

  • Eisenberg D, Marcotte EM, Xenarios I, Yeates TO (2000) Protein function in the post-genomic era. Nature 405:823–826

    Article  CAS  PubMed  Google Scholar 

  • Enault F, Suhre K, Claverie JM (2005) Phydbac “Gene Function Predictor”: a gene annotation tool based on genomic context analysis. BMC Bioinformatics 6:247

    Article  PubMed  CAS  Google Scholar 

  • Friedberg I (2006) Automated protein function prediction–the genomic challenge. Brief Bioinform 7:225–242

    Article  CAS  PubMed  Google Scholar 

  • Galperin MY, Walker DR, Koonin EV (1998) Analogous enzymes: independent inventions in enzyme evolution. Genome Res 8:779–790

    CAS  PubMed  Google Scholar 

  • Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. Curr Opin Struct Biol 6:377–385

    Article  CAS  PubMed  Google Scholar 

  • Gilks WR, Audit B, de Angelis D, Tsoka S, Ouzounis CA (2005) Percolation of annotation errors through hierarchically structured protein sequence databases. Math Biosci 193:223–234

    Article  CAS  PubMed  Google Scholar 

  • Godzik A, Jambon M, Friedberg I (2007) Computational protein function prediction: are we making progress? Cell Mol Life Sci 64:2505–2511

    Article  CAS  PubMed  Google Scholar 

  • Goldsmith-Fischman S, Honig B (2003) Structural genomics: computational methods for structure analysis. Protein Sci 12:1813–1821

    Article  CAS  PubMed  Google Scholar 

  • Henikoff JG, Greene EA, Pietrokovski S, Henikoff S (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28:228–230

    Article  CAS  PubMed  Google Scholar 

  • Hulo N et al (2008) The 20 years of PROSITE. Nucleic Acids Res 36:D245–D249

    Article  CAS  PubMed  Google Scholar 

  • Jeffery CJ (2003) Moonlighting proteins: old proteins learning new tricks. Trends Genet 19:415–417

    Article  CAS  PubMed  Google Scholar 

  • Jones S, Thornton JM (2004) Searching for functional sites in protein structures. Curr Opin Chem Biol 8:3–7

    Article  CAS  PubMed  Google Scholar 

  • Laskowski RA, Watson JD, Thornton JM (2003) From protein structure to biochemical function? J Struct Funct Genomics 4:167–177

    Article  CAS  PubMed  Google Scholar 

  • Lehne B, Schlitt T (2009) Protein-protein interaction databases: keeping up with growing interactomes. Hum Genomics 3:291–297

    PubMed  CAS  Google Scholar 

  • Losko S, Heumann K (2009) Semantic data integration and knowledge management to represent biological network associations. Methods Mol Biol 563:241–258

    Article  PubMed  CAS  Google Scholar 

  • Rost B (2002) Enzyme function less conserved than anticipated. J Mol Biol 318:595–608

    Article  CAS  PubMed  Google Scholar 

  • Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y (2003) Automatic prediction of protein function. Cell Mol Life Sci 60:2637–2650

    Article  CAS  PubMed  Google Scholar 

  • Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3:88

    Article  PubMed  Google Scholar 

  • Sleator RD, Shortall C, Hill C (2008) Metagenomics. Lett Appl Microbiol 47:361–366

    Article  CAS  PubMed  Google Scholar 

  • Smith CL, Goldsmith CA, Eppig JT (2005) The mammalian phenotype ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol 6:R7

    Article  PubMed  Google Scholar 

  • Taubig H, Buchner A, Griebsch J (2006) PAST: fast structure-based searching in the PDB. Nucleic Acids Res 34:W20–W23

    Article  PubMed  Google Scholar 

  • Todd AE, Orengo CA, Thornton JM (2001) Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 307:1113–1143

    Article  CAS  PubMed  Google Scholar 

  • Walker MG, Volkmuth W, Sprinzak E, Hodgson D, Klingler T (1999) Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res 9:1198–1203

    Article  CAS  PubMed  Google Scholar 

  • Wallace AC, Laskowski RA, Thornton JM (1996) Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci 5:1001–1013

    Article  CAS  PubMed  Google Scholar 

  • Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data. Curr Opin Struct Biol 15:275–284

    Article  CAS  PubMed  Google Scholar 

  • Ye Y, Godzik A (2004) FATCAT: a web server for flexible structure comparison and structure similarity searching. Nucleic Acids Res 32:W582–W585

    Article  CAS  PubMed  Google Scholar 

  • Zhao XM, Chen L, Aihara K (2008) Protein function prediction with high-throughput data. Amino Acids 35:517–530

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors wish to acknowledge the financial assistance of the Faculty of Engineering and Science, the Department of Biological Sciences and the Department of Computing at Cork Institute of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roy D. Sleator.

Additional information

Communicated by Erko Stackebrandt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sleator, R.D., Walsh, P. An overview of in silico protein function prediction. Arch Microbiol 192, 151–155 (2010). https://doi.org/10.1007/s00203-010-0549-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00203-010-0549-9

Keywords

Navigation