In silico Identification and Characterization of Protein-Ligand Binding Sites

Part of the Methods in Molecular Biology book series (MIMB, volume 1414)


Protein–ligand binding site prediction methods aim to predict, from amino acid sequence, protein–ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein–ligand interactions has become extremely important to help determine a protein’s functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein–ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein–ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein–ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.

Key words

Protein function prediction Protein–ligand interactions Binding site residue prediction Biochemical functional elucidation Critical Assessment of Techniques for Protein Structure Prediction (CASP) Continuous Automated EvaluatiOn (CAMEO) Protein structure prediction Structure-based function prediction Quality assessment of protein–ligand binding site predictions 



Daniel Barry Roche is a recipient of a Young Investigator Fellowship from the Institut de Biologie Computationnelle, Université de Montpellier (ANR Investissements D’Avenir Bio-informatique: projet IBC).


  1. 1.
    Roche DB, Buenavista MT, Mcguffin LJ (2012) FunFOLDQA: a quality assessment tool for protein-ligand binding site residue predictions. PLoS One 7:e38219CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Roche DB, Buenavista MT, Mcguffin LJ (2013) The FunFOLD2 server for the prediction of protein-ligand interactions. Nucleic Acids Res 41:W303–W307CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Roche DB, Tetchner SJ, Mcguffin LJ (2011) FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins. BMC Bioinformatics 12:160CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Oh M, Joo K, Lee J (2009) Protein-binding site prediction based on three-dimensional protein modeling. Proteins 77(Suppl 9):152–156CrossRefPubMedGoogle Scholar
  5. 5.
    Lopez G, Maietta P, Rodriguez JM et al (2011) Firestar--advances in the prediction of functionally important residues. Nucleic Acids Res 39:W235–W241CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Lopez G, Valencia A, Tress ML (2007) Firestar--prediction of functionally important residues using structural templates and alignment reliability. Nucleic Acids Res 35:W573–W577CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Talavera D, Laskowski RA, Thornton JM (2009) WSsas: a web service for the annotation of functional residues through structural homologues. Bioinformatics 25:1192–1194CrossRefPubMedGoogle Scholar
  8. 8.
    Sankararaman S, Kolaczkowski B, Sjolander K (2009) INTREPID: a web server for prediction of functionally important residues by evolutionary analysis. Nucleic Acids Res 37:W390–W395CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Ye K, Feenstra KA, Heringa J et al (2008) Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting. Bioinformatics 24:18–25CrossRefPubMedGoogle Scholar
  10. 10.
    Ashkenazy H, Erez E, Martz E et al (2010) ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38(Suppl):W529–W533CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Wass MN, Sternberg MJ (2008) ConFunc--functional annotation in the twilight zone. Bioinformatics 24:798–806CrossRefPubMedGoogle Scholar
  12. 12.
    Sankararaman S, Sha F, Kirsch JF et al (2010) Active site prediction using evolutionary and structural information. Bioinformatics 26:617–624CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Dong-Jun Y, Jun H, Jing Y et al (2013) Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans Comput Biol Bioinform 10:994–1008CrossRefGoogle Scholar
  14. 14.
    Chen P, Huang JHZ, Gao X (2014) LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinformatics 15:S4CrossRefGoogle Scholar
  15. 15.
    Brylinski M, Skolnick J (2008) A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci U S A 105:129–134CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Spitzer R, Cleves AE, Jain AN (2011) Surface-based protein binding pocket similarity. Proteins 79:2746–2763CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Xie ZR, Liu CK, Hsiao FC et al (2013) LISE: a server using ligand-interacting and site-enriched protein triangles for prediction of ligand-binding sites. Nucleic Acids Res 41:W292–W296CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Zhu X, Xiong Y, Kihara D (2015) Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0. Bioinformatics 31:707–713CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Cao Y, Li L (2014) Improved protein-ligand binding affinity prediction by using a curvature-dependent surface-area model. Bioinformatics 30:1674–1680CrossRefPubMedGoogle Scholar
  20. 20.
    Fuller JC, Martinez M, Henrich S et al (2014) LigDig: a web server for querying ligand-protein interactions. Bioinformatics 31:1147–1149CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Erdin S, Ward RM, Venner E et al (2010) Evolutionary trace annotation of protein function in the structural proteome. J Mol Biol 396:1451–1473CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Madabushi S, Yao H, Marsh M et al (2002) Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol 316:139–154CrossRefPubMedGoogle Scholar
  23. 23.
    Hernandez M, Ghersi D, Sanchez R (2009) SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res 37:W413–W416CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Yang J, Roy A, Zhang Y (2013) Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29:2588–2595CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Roy A, Yang J, Zhang Y (2012) COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res 40:W471–W477CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Heo L, Shin WH, Lee MS et al (2014) GalaxySite: ligand-binding-site prediction by using molecular docking. Nucleic Acids Res 42:W210–W214CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Izidoro SC, De Melo-Minardi RC, Pappa GL (2014) GASS: identifying enzyme active sites with genetic algorithms. Bioinformatics 31:864–870CrossRefPubMedGoogle Scholar
  28. 28.
    Huang B, Schroeder M (2006) LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 6:19CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Andersson CD, Chen BY, Linusson A (2010) Mapping of ligand-binding cavities in proteins. Proteins 78:1408–1422PubMedPubMedCentralGoogle Scholar
  30. 30.
    Lopez G, Ezkurdia I, Tress ML (2009) Assessment of ligand binding residue predictions in CASP8. Proteins 77(Suppl 9):138–146CrossRefPubMedGoogle Scholar
  31. 31.
    Schmidt T, Haas J, Cassarino TG et al (2011) Assessment of ligand binding residue predictions in CASP9. Proteins: Structure, Function, and Bioinformatics 79 Suppl 10:126–136Google Scholar
  32. 32.
    Gallo Cassarino T, Bordoli L, Schwede T (2014) Assessment of ligand binding site predictions in CASP10. Proteins 82(Suppl 2):154–163CrossRefPubMedGoogle Scholar
  33. 33.
    Haas J, Roth S, Arnold K et al (2013) The Protein Model Portal--a comprehensive resource for protein structure and model information. Database (Oxford) 2013:bat031CrossRefGoogle Scholar
  34. 34.
    Wass MN, Sternberg MJ (2009) Prediction of ligand binding sites using homologous structures and conservation at CASP8. Proteins 77(Suppl 9):147–151CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451CrossRefPubMedGoogle Scholar
  36. 36.
    Roche DB, Tetchner SJ, Mcguffin LJ (2010) The binding site distance test score: a robust method for the assessment of predicted protein binding sites. Bioinformatics 26:2920–2921CrossRefPubMedGoogle Scholar
  37. 37.
    Buenavista MT, Roche DB, Mcguffin LJ (2012) Improvement of 3D protein models using multiple templates guided by single-template model quality assessment. Bioinformatics 28:1851–1857CrossRefPubMedGoogle Scholar
  38. 38.
    Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33:2302–2309CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Yang J, Roy A, Zhang Y (2013) BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res 41:D1096–D1103CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Xu J, Zhang Y (2010) How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26:889–895CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Mcguffin LJ, Roche DB (2010) Rapid model quality assessment for protein structure predictions using the comparison of multiple models without structural alignments. Bioinformatics 26:182–188CrossRefPubMedGoogle Scholar
  42. 42.
    Webb EC (1989) Nomenclature Committee of the International-Union-of-Biochemistry (Nc-Iub) - Enzyme Nomenclature - Recommendations 1984 - Supplement-2 - Corrections and Additions. Eur J Biochem 179:489–533CrossRefPubMedGoogle Scholar
  43. 43.
    Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Mcguffin LJ, Atkins JD, Salehe BR et al (2015) IntFOLD: an integrated server for modelling protein structures and functions from amino acid sequences. Nucleic Acids Research 43:W169–W173CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Bindschedler LV, Mcguffin LJ, Burgis TA et al (2011) Proteogenomics and in silico structural and functional annotation of the barley powdery mildew Blumeria graminis f. sp. hordei. Methods 54:432–441CrossRefPubMedGoogle Scholar
  46. 46.
    Pedersen C, Ver Loren Van Themaat E, Mcguffin LJ et al (2012) Structure and evolution of barley powdery mildew effector candidates. BMC Genomics 13:694CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Zhou Y, Xue S, Yang JJ (2013) Calciomics: integrative studies of Ca2+−binding proteins and their interactomes in biological systems. Metallomics 5:29–42CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Don CG, Riniker S (2014) Scents and sense: in silico perspectives on olfactory receptors. J Comput Chem 35:2279–2287CrossRefPubMedGoogle Scholar
  49. 49.
    Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Letunic I, Doerks T, Bork P (2015) SMART: recent updates, new developments and status in 2015. Nucleic Acids Res 43:D257–D260CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Feng Z, Chen L, Maddula H et al (2004) Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 20:2153–2155CrossRefPubMedGoogle Scholar
  52. 52.
    Roche DB, Buenavista MT, Mcguffin LJ (2014) Assessing the quality of modelled 3D protein structures using the ModFOLD server. Methods Mol Biol 1137:83–103CrossRefPubMedGoogle Scholar
  53. 53.
    Roche DB, Buenavista MT, Tetchner SJ et al (2011) The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction. Nucleic Acids Res 39:W171–W176CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Mcguffin LJ, Buenavista MT, Roche DB (2013) The ModFOLD4 server for the quality assessment of 3D protein models. Nucleic Acids Res 41:W368–W372CrossRefPubMedPubMedCentralGoogle Scholar
  55. 55.
    Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5:725–738CrossRefPubMedPubMedCentralGoogle Scholar
  56. 56.
    Mcguffin LJ (2008) Intrinsic disorder prediction from the analysis of multiple protein fold recognition models. Bioinformatics 24:1798–1804CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Institut de Biologie Computationnelle, LIRMM, CNRS, Université de MontpellierMontpellierFrance
  2. 2.Centre de Recherche en Biologie cellulaire de Montpellier, CNRS-UMR 5237MontpellierFrance
  3. 3.School of Biological SciencesUniversity of ReadingReadingUK

Personalised recommendations