P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features

  • Radoslav KrivákEmail author
  • David Hoksza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9199)


The knowledge of protein-ligand binding sites is vital prerequisite for any structure-based virtual screening campaign. If no prior knowledge about binding sites is available, the ligand-binding site prediction methods are the only way to obtain the necessary information. Here we introduce P2RANK, a novel machine learning-based method for prediction of ligand binding sites from protein structure. P2RANK uses Random Forests learner to infer ligandability of local chemical neighborhoods near the protein surface which are represented by specific near-surface points and described by aggregating physico-chemical features projected on those points from neighboring protein atoms. The points with high predicted ligandability are clustered and ranked to obtain the resulting list of binding site predictions. The new method was compared with a state-of-the-art binding site prediction method Fpocket on three representative datasets. The results show that P2RANK outperforms Fpocket by 10 to 20 % points on all the datasets. Moreover, since P2RANK does not rely on any external software for computation of various complex features, such as sequence conservation scores or binding energies, it represents an ideal tool for inclusion into future structural bioinformatics pipelines.


Ligand-binding site prediction Protein structure Molecular recognition Machine learning Random forest 



This work was supported by the Czech Science Foundation grant 14-29032P and by project SVV-2015-260222 and by the Charles University in Prague, project GA UK No. 174615.


  1. 1.
    An, J., Totrov, M., Abagyan, R.: Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol. Cell. Proteomics 4(6), 752–761 (2005)CrossRefGoogle Scholar
  2. 2.
    Boulesteix, A.L., Janitza, S., Kruppa, J., K-nig, I.R.: Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 2(6), 493–507 (2012)CrossRefGoogle Scholar
  3. 3.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Brylinski, M., Skolnick, J.: A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc. Natl. Acad. Sci. U.S.A 105(1), 129–134 (2008)CrossRefGoogle Scholar
  5. 5.
    Capra, J.A., Laskowski, R.A., Thornton, J.M., Singh, M., Funkhouser, T.A.: Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3d structure. PLoS Comput. Biol. 5(12), e1000585 (2009)CrossRefGoogle Scholar
  6. 6.
    Chen, K., Mizianty, M., Gao, J., Kurgan, L.: A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds. Structure (London, England: 1993) 19(5), 613–621 (2011)CrossRefGoogle Scholar
  7. 7.
    Chen, P., Huang, J.Z., Gao, X.: Ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinform. 15(15), S4 (2014)CrossRefGoogle Scholar
  8. 8.
    Desaphy, J., Azdimousa, K., Kellenberger, E., Rognan, D.: Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes. J. Chem. Inf. Model. 52(8), 2287–2299 (2012)CrossRefGoogle Scholar
  9. 9.
    Eisenhaber, F., Lijnzaad, P., Argos, P., Sander, C., Scharf, M.: The double cubic lattice method: efficient approaches to numerical integration of surface area and volume and to dot surface contouring of molecular assemblies. J. Comput. Chem. 16(3), 273–284 (1995)CrossRefGoogle Scholar
  10. 10.
    Ghersi, D., Sanchez, R.: EasyMIFS and SiteHound: a toolkit for the identification of ligand-binding sites in protein structures. Bioinformatics (Oxford, England) 25(23), 3185–3186 (2009)CrossRefGoogle Scholar
  11. 11.
    Hartshorn, M., Verdonk, M., Chessari, G., Brewerton, S., Mooij, W., Mortenson, P., Murray, C.: Diverse, high-quality test set for the validation of protein-ligand docking performance. J. Med. Chem. 50(4), 726–741 (2007)CrossRefGoogle Scholar
  12. 12.
    Hendlich, M., Rippmann, F., Barnickel, G.: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Model. 15(6), 359–363, 389 (1997)Google Scholar
  13. 13.
    Henrich, S., Outi, S., Huang, B., Rippmann, F., Cruciani, G., Wade, R.: Computational approaches to identifying and characterizing protein binding sites for ligand design. J. Mol. Recogn. (JMR) 23(2), 209–219 (2010)Google Scholar
  14. 14.
    Huang, B.: MetaPocket: a meta approach to improve protein ligand binding site prediction. Omics J. Integr. Biol. 13(4), 325–330 (2009)CrossRefGoogle Scholar
  15. 15.
    Huang, B., Schroeder, M.: Ligsitecsc: predicting ligand binding sites using the connolly surface and degree of conservation. BMC Struct. Biol. 6(1), 19 (2006). CrossRefGoogle Scholar
  16. 16.
    Kauffman, C., Karypis, G.: Librus: combined machine learning and homology information for sequence-based ligand-binding residue prediction. Bioinformatics (Oxford, England) 25(23), 3099–3107 (2009).
  17. 17.
    Khazanov, N.A., Carlson, H.A.: Exploring the composition of protein-ligand binding sites on a large scale. PLoS Comput. Biol. 9(11), e1003321 (2013)CrossRefGoogle Scholar
  18. 18.
    Konc, J., Janei, D.: Binding site comparison for function prediction and pharmaceutical discovery. Curr. Opin. Struct. Biol. 25, 34–39 (2014)CrossRefGoogle Scholar
  19. 19.
    Krivak, R., Hoksza, D.: Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. J. Cheminformatics 7(1), 12 (2015). CrossRefGoogle Scholar
  20. 20.
    Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157(1), 105–132 (1982). CrossRefGoogle Scholar
  21. 21.
    Labute, P., Santavy, M.: Locating binding sites in protein structures (2001). Accessed 16 April 2015
  22. 22.
    Laurie, A., Jackson, R.: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics (Oxford, England) 21(9), 1908–1916 (2005)CrossRefGoogle Scholar
  23. 23.
    Laurie, A., Jackson, R.: Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. Curr. Protein Pept. Sci. 7(5), 395–406 (2006)CrossRefGoogle Scholar
  24. 24.
    Le Guilloux, V., Schmidtke, P., Tuffery, P.: Fpocket: an open source platform for ligand pocket detection. BMC Bioinform. 10(1), 168 (2009). CrossRefGoogle Scholar
  25. 25.
    Leis, S., Schneider, S., Zacharias, M.: In silico prediction of binding sites on proteins. Curr. Med. Chem. 17(15), 1550–1562 (2010)CrossRefGoogle Scholar
  26. 26.
    Levitt, D.G., Banaszak, L.J.: Pocket: a computer graphies method for identifying and displaying protein cavities and their surrounding amino acids. J. Mol. Graph. 10(4), 229–234 (1992). CrossRefGoogle Scholar
  27. 27.
    Morita, M., Nakamura, S., Shimizu, K.: Highly accurate method for ligand-binding site prediction in unbound state (apo) protein structures. Proteins 73(2), 468–479 (2008)CrossRefGoogle Scholar
  28. 28.
    Nair, R., Liu, J., Soong, T.T., Acton, T., Everett, J., Kouranov, A., Fiser, A., Godzik, A., Jaroszewski, L., Orengo, C., et al.: Structural genomics is the largest contributor of novel structural leverage. J. Struct. Funct. Genom. 10(2), 181–191 (2009)CrossRefGoogle Scholar
  29. 29.
    Nayal, M., Honig, B.: On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins 63(4), 892–906 (2006)CrossRefGoogle Scholar
  30. 30.
    Pérot, S., Sperandio, O., Miteva, M., Camproux, A., Villoutreix, B.: Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Drug Discovery Today 15(15–16), 656–667 (2010)CrossRefGoogle Scholar
  31. 31.
    Pintar, A., Carugo, O., Pongor, S.: Cx, an algorithm that identifies protruding atoms in proteins. Bioinformatics 18(7), 980–984 (2002)CrossRefGoogle Scholar
  32. 32.
    Qiu, Z., Qin, C., Jiu, M., Wang, X.: A simple iterative method to optimize protein-ligand-binding residue prediction. J. Theor. Biol. 317, 219–223 (2013)CrossRefGoogle Scholar
  33. 33.
    Qiu, Z., Wang, X.: Improved prediction of protein ligand-binding sites using random forests. Protein Pept. Lett. 18(12), 1212–1218 (2011). CrossRefGoogle Scholar
  34. 34.
    Rognan, D.: Docking Methods for Virtual Screening: Principles and Recent Advances, pp. 153–176. Wiley, Weinheim (2011). CrossRefGoogle Scholar
  35. 35.
    Schmidtke, P., Souaille, C., Estienne, F., Baurin, N., Kroemer, R.: Large-scale comparison of four binding site detection algorithms. J. Chem. Inf. Model. 50(12), 2191–2200 (2010)CrossRefGoogle Scholar
  36. 36.
    Schneider, S., Zacharias, M.: Combining geometric pocket detection and desolvation properties to detect putative ligand binding sites on proteins. J. Struct. Biol. 180(3), 546–550 (2012)CrossRefGoogle Scholar
  37. 37.
    Schomburg, K., Bietz, S., Briem, H., Henzler, A., Urbaczek, S., Rarey, M.: Facing the challenges of structure-based target prediction by inverse virtual screening. J. Chem. Inf. Model. 54(6), 1676–1686 (2014)CrossRefGoogle Scholar
  38. 38.
    Skolnick, J., Brylinski, M.: FINDSITE: a combined evolution/structure-based approach to protein function prediction. Briefings Bioinform. 10(4), 378–391 (2009)CrossRefGoogle Scholar
  39. 39.
    Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The chemistry development kit (CDK): an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43(2), 493–500 (2003). pMID: 12653513CrossRefGoogle Scholar
  40. 40.
    Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and qsar modeling. J. chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)CrossRefGoogle Scholar
  41. 41.
    Weisel, M., Proschak, E., Schneider, G.: Pocketpicker: analysis of ligand binding-sites with shape descriptors. Chem. Central J. 1(1), 7 (2007). CrossRefGoogle Scholar
  42. 42.
    Xie, L., Xie, L., Bourne, P.E.: Structure-based systems biology for analyzing off-target binding. Curr. Opin. Struct. Biol. 21(2), 189–199 (2011)CrossRefGoogle Scholar
  43. 43.
    Zhang, Z., Li, Y., Lin, B., Schroeder, M., Huang, B.: Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics (Oxford, England) 27(15), 2083–2088 (2011)CrossRefGoogle Scholar
  44. 44.
    Zheng, X., Gan, L., Wang, E., Wang, J.: Pocket-based drug design: exploring pocket space. AAPS J. 15, 228–241 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.FMP, Department of Software EngineeringCharles University in PraguePragueCzech Republic

Personalised recommendations