Comparison of Cutoff Strategies for Geometrical Features in Machine Learning-Based Scoring Functions
Countings of protein-ligand contacts are popular geometrical features in scoring functions for structure-based drug design. When extracting features, cutoff values are used to define the range of distances within which a protein-ligand atom pair is considered as in contact. But effects of the number of ranges and the choice of cutoff values on the predictive ability of scoring functions are unclear. Here, we compare five cutoff strategies (one-, two-, three-, six-range and soft boundary) with four machine learning methods. Prediction models are constructed using the latest PDBbind v2012 data sets and assessed by correlation coefficients. Our results show that the optimal one-range cutoff value lies between 6 and 8 Å instead of the customary choice of 12 Å. In general, two-range models have improved predictive performance in correlation coefficients by 3-5%, but introducing more cutoff ranges do not always help improving the prediction accuracy.
Keywordsscoring function protein-ligand binding affinity geometrical features machine learning structure-based drug design
Unable to display preview. Download preview PDF.
- 13.Hechenbichler, K., Schliep, K.: Weighted k-nearest-neighbor techniques and ordinal classification. Discussion paper 399, SFB 386 (2004)Google Scholar
- 14.Wang, R., Lai, L., Wang, S.: Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput.-Aided Mol. Des. 16, 11–26 (2002), The program X-Score v1.2, http://sw16.im.med.umich.edu/software/xtool (August 2013)
- 15.Neudert, G., Klebe, G.: DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes. J. Chem. Inf. Model. 51, 2731–2745 (2011), The program DSX 0.89, http://pc1664.pharmazie.uni-marburg.de/drugscore/dsx_download.php (August 2013)
- 16.R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012)Google Scholar