Comparison of Cutoff Strategies for Geometrical Features in Machine Learning-Based Scoring Functions

  • Shirley W. I. Siu
  • Thomas K. F. Wong
  • Simon Fong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8347)


Countings of protein-ligand contacts are popular geometrical features in scoring functions for structure-based drug design. When extracting features, cutoff values are used to define the range of distances within which a protein-ligand atom pair is considered as in contact. But effects of the number of ranges and the choice of cutoff values on the predictive ability of scoring functions are unclear. Here, we compare five cutoff strategies (one-, two-, three-, six-range and soft boundary) with four machine learning methods. Prediction models are constructed using the latest PDBbind v2012 data sets and assessed by correlation coefficients. Our results show that the optimal one-range cutoff value lies between 6 and 8 Å  instead of the customary choice of 12 Å. In general, two-range models have improved predictive performance in correlation coefficients by 3-5%, but introducing more cutoff ranges do not always help improving the prediction accuracy.


scoring function protein-ligand binding affinity geometrical features machine learning structure-based drug design 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kitchen, D.B., Decornez, H., Furr, J.R., Bajorath, J.: Docking and scoring in virtual screening for drug discovery: Methods and applications. Nat. Rev. 3, 935–949 (2004)CrossRefGoogle Scholar
  2. 2.
    Huang, S.Y., Grinter, S.Z., Zou, X.: Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. Phys. Chem. Chem. Phys. 12, 12899–12908 (2010)CrossRefGoogle Scholar
  3. 3.
    Cheng, T., Li, X., Li, Y., Liu, Z., Wang, R.: Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49, 1079–1093 (2009)CrossRefGoogle Scholar
  4. 4.
    Ashtawy, H.M., Mahapatra, N.R.: A comparative assessment of ranking accuracies of conventional and machine-learning-based scoring functions for protein-ligand binding affinity prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 9, 1301–1312 (2012)CrossRefGoogle Scholar
  5. 5.
    Ballester, P.J., Mitchell, J.B.O.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinf. 26, 1169–1175 (2010)CrossRefGoogle Scholar
  6. 6.
    Li, L., Wang, B., Meroueh, S.O.: Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries. J. Chem. Inf. Model. 51, 2132–2138 (2011)CrossRefGoogle Scholar
  7. 7.
    Durrant, J.D., McCammon, J.A.: BINANA: A novel algorithm for ligand-binding characterization. J. Mol. Graphics. Modell. 29, 888–893 (2011)CrossRefGoogle Scholar
  8. 8.
    Durrant, J.D., Mc Cammon, J.A.: NNScore 2.0: A neural-network receptor-ligand scoring function. J. Chem. Inf. Model. 51, 2897–2903 (2011)CrossRefGoogle Scholar
  9. 9.
    Ouyang, X., Handoko, S.D., Kwoh, C.K.: CScore: A simple yet effective scoring function for protein-ligand binding affinity prediction using modified CMAC learning architecture. J. Bioinf. Comput. Biol. 9, 1–14 (2011)CrossRefGoogle Scholar
  10. 10.
    Wang, R., Fang, X., Lu, Y., Wang, S.: The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004)CrossRefGoogle Scholar
  11. 11.
    Muegge, I., Martin, Y.C.: A general and fast scoring function for protein-ligand interactions: a simplified potential approach. J. Med. Chem. 42, 791–804 (1999)CrossRefGoogle Scholar
  12. 12.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  13. 13.
    Hechenbichler, K., Schliep, K.: Weighted k-nearest-neighbor techniques and ordinal classification. Discussion paper 399, SFB 386 (2004)Google Scholar
  14. 14.
    Wang, R., Lai, L., Wang, S.: Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput.-Aided Mol. Des. 16, 11–26 (2002), The program X-Score v1.2, (August 2013)
  15. 15.
    Neudert, G., Klebe, G.: DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes. J. Chem. Inf. Model. 51, 2731–2745 (2011), The program DSX 0.89, (August 2013)
  16. 16.
    R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012)Google Scholar
  17. 17.
    Kramer, C., Gedeck, P.: Global free energy scoring functions based on distance-dependend atom-type pair descriptors. J. Chem. Inf. Model. 51, 707–720 (2011)CrossRefGoogle Scholar
  18. 18.
    Hsu, K.-C., Chen, Y.-F., Yang, J.-M.: GemAffinity: a scoring function for predicting binding affinity and virtual screening. Int. J. Data Mining and Bioinformatics 6, 27–41 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Shirley W. I. Siu
    • 1
  • Thomas K. F. Wong
    • 1
  • Simon Fong
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of MacauMacauChina

Personalised recommendations