Pattern Recognition in Bioinformatics

Volume 7632 of the series Lecture Notes in Computer Science pp 14-25

Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression

  • Pedro J. BallesterAffiliated withCarnegie Mellon UniversityEuropean Bioinformatics Institute

* Final gross prices may vary according to local VAT.

Get Access


Accurately predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction exploiting structural data are essential for analysing the outputs of Molecular Docking, which is in turn an important technique for drug discovery, chemical biology and structural biology. Conventional scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex and its predicted binding affinity. The inherent problem of this approach is in the difficulty of explicitly modelling the various contributions of intermolecular interactions to binding affinity.

Recently, a new family of 3D structure-based regression models for binding affinity prediction has been introduced which circumvent the need for modelling assumptions. These machine learning scoring functions have been shown to widely outperform conventional scoring functions. However, to date no direct comparison among machine learning scoring functions has been made. Here the performance of the two most popular machine learning scoring functions for this task is analysed under exactly the same experimental conditions.


molecular docking scoring functions machine learning chemical informatics structural bioinformatics