P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features
The knowledge of protein-ligand binding sites is vital prerequisite for any structure-based virtual screening campaign. If no prior knowledge about binding sites is available, the ligand-binding site prediction methods are the only way to obtain the necessary information. Here we introduce P2RANK, a novel machine learning-based method for prediction of ligand binding sites from protein structure. P2RANK uses Random Forests learner to infer ligandability of local chemical neighborhoods near the protein surface which are represented by specific near-surface points and described by aggregating physico-chemical features projected on those points from neighboring protein atoms. The points with high predicted ligandability are clustered and ranked to obtain the resulting list of binding site predictions. The new method was compared with a state-of-the-art binding site prediction method Fpocket on three representative datasets. The results show that P2RANK outperforms Fpocket by 10 to 20 % points on all the datasets. Moreover, since P2RANK does not rely on any external software for computation of various complex features, such as sequence conservation scores or binding energies, it represents an ideal tool for inclusion into future structural bioinformatics pipelines.
KeywordsLigand-binding site prediction Protein structure Molecular recognition Machine learning Random forest
This work was supported by the Czech Science Foundation grant 14-29032P and by project SVV-2015-260222 and by the Charles University in Prague, project GA UK No. 174615.
- 12.Hendlich, M., Rippmann, F., Barnickel, G.: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J. Mol. Graph. Model. 15(6), 359–363, 389 (1997)Google Scholar
- 13.Henrich, S., Outi, S., Huang, B., Rippmann, F., Cruciani, G., Wade, R.: Computational approaches to identifying and characterizing protein binding sites for ligand design. J. Mol. Recogn. (JMR) 23(2), 209–219 (2010)Google Scholar
- 16.Kauffman, C., Karypis, G.: Librus: combined machine learning and homology information for sequence-based ligand-binding residue prediction. Bioinformatics (Oxford, England) 25(23), 3099–3107 (2009). http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=19786483
- 20.Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157(1), 105–132 (1982). http://www.sciencedirect.com/science/article/pii/0022283682905150 CrossRefGoogle Scholar
- 21.Labute, P., Santavy, M.: Locating binding sites in protein structures (2001). http://www.chemcomp.com/journal/sitefind.htm. Accessed 16 April 2015
- 26.Levitt, D.G., Banaszak, L.J.: Pocket: a computer graphies method for identifying and displaying protein cavities and their surrounding amino acids. J. Mol. Graph. 10(4), 229–234 (1992). http://www.sciencedirect.com/science/article/pii/026378559280074N CrossRefGoogle Scholar
- 33.Qiu, Z., Wang, X.: Improved prediction of protein ligand-binding sites using random forests. Protein Pept. Lett. 18(12), 1212–1218 (2011). http://www.ingentaconnect.com/content/ben/ppl/2011/00000018/00000012/art00005 CrossRefGoogle Scholar