Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

IAPR International Conference on Pattern Recognition in Bioinformatics

PRIB 2012: Pattern Recognition in Bioinformatics pp 14–25Cite as

  1. Home
  2. Pattern Recognition in Bioinformatics
  3. Conference paper
Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression

Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression

  • Pedro J. Ballester23 
  • Conference paper
  • 2002 Accesses

  • 12 Citations

Part of the Lecture Notes in Computer Science book series (LNBI,volume 7632)

Abstract

Accurately predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction exploiting structural data are essential for analysing the outputs of Molecular Docking, which is in turn an important technique for drug discovery, chemical biology and structural biology. Conventional scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex and its predicted binding affinity. The inherent problem of this approach is in the difficulty of explicitly modelling the various contributions of intermolecular interactions to binding affinity.

Recently, a new family of 3D structure-based regression models for binding affinity prediction has been introduced which circumvent the need for modelling assumptions. These machine learning scoring functions have been shown to widely outperform conventional scoring functions. However, to date no direct comparison among machine learning scoring functions has been made. Here the performance of the two most popular machine learning scoring functions for this task is analysed under exactly the same experimental conditions.

Keywords

  • molecular docking
  • scoring functions
  • machine learning
  • chemical informatics
  • structural bioinformatics

Download conference paper PDF

References

  1. Moitessier, N., et al.: Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br. J. Pharmacol. 153, S7–S26 (2008)

    Google Scholar 

  2. Huang, N., et al.: Molecular mechanics methods for predicting protein-ligand binding. Phys. Chem. Chem. Phys. 8, 5166–5177 (2006)

    CrossRef  Google Scholar 

  3. Mitchell, J.B.O., et al.: BLEEP - potential of mean force describing protein-ligand interactions: I. Generating potential. J. Comput. Chem. 20, 1165–1176 (1999)

    CrossRef  Google Scholar 

  4. Guvench, O., MacKerell Jr., A.D.: Computational evaluation of protein-small molecule binding. Curr. Opin. Struct. Biol. 19, 56–61 (2009)

    CrossRef  Google Scholar 

  5. Michel, J., Essex, J.W.: Prediction of protein–ligand binding affinity by free energy simulations: assumptions, pitfalls and expectations. J. Comput. Aided Mol. Des. 24, 639–658 (2010)

    CrossRef  Google Scholar 

  6. Ballester, P.J., Mitchell, J.B.O.: A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26, 1169–1175 (2010)

    CrossRef  Google Scholar 

  7. Marshall, G.R.: Limiting assumptions in structure-based design: binding entropy. J. Comput. Aided Mol. Des. 26(1), 3–8 (2012)

    CrossRef  Google Scholar 

  8. Baum, B., Muley, L., Smolinski, M., Heine, A., Hangauer, D., Klebe, G.: Non-additivity of functional group contributions in protein-ligand binding: a comprehensive study by crystallography and isothermal titration calorimetry. J. Mol. Biol. 397, 1042–1054 (2010)

    CrossRef  Google Scholar 

  9. Arunan, E., et al.: Definition of the hydrogen bond (IUPAC Recommendations 2011). Pure and Applied Chemistry 83, 1637–1641 (2011)

    CrossRef  Google Scholar 

  10. Snyder, P.W., et al.: Mechanism of the hydrophobic effect in the biomolecular recognition of arylsulfonamides by carbonic anhydrase. Proceedings of the National Academy of Sciences 108, 17889–17894 (2011)

    CrossRef  Google Scholar 

  11. Li, L., Li, J., Khanna, M., Jo, I., Baird, J.P., Meroueh, S.O.: Docking to Erlotinib Off-Targets Leads to Inhibitors of Lung Cancer Cell Proliferation with Suitable in Vitro Pharmacokinetics. ACS Med. Chem. Lett. 1(5), 229–233 (2010)

    CrossRef  Google Scholar 

  12. Durrant, J.D., McCammon, J.A.: NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand Complexes. J. Chem. Inf. Model. 50(10), 1865–1871 (2010)

    CrossRef  Google Scholar 

  13. Ballester, P.J., Mitchell, J.B.O.: Comments on ‘Leave-Cluster-Out Cross-Validation is appropriate for scoring functions derived from diverse protein data sets’: Significance for the validation of scoring functions. J. Chem. Inf. Model. 51, 1739–1741 (2011)

    CrossRef  Google Scholar 

  14. Cheng, T., Li, Q., Zhou, Z., Wang, Y., Bryant, S.H.: Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review. The AAPS Journal 14(1), 133–141 (2012)

    CrossRef  Google Scholar 

  15. Kinnings, S.L., Liu, N., Tonge, P.J., Jackson, R.M., Xie, L., Bourne, P.E.: A Machine Learning-Based Method to Improve Docking Scoring Functions and its Application to Drug Repurposing. J. Chem. Inf. Model. 51, 408–419 (2011)

    CrossRef  Google Scholar 

  16. Das, S., Krein, M.P., Breneman, C.M.: Binding Affinity Prediction with Property-Encoded Shape Distribution Signatures. J. Chem. Inf. Model. 50, 298–308 (2010)

    CrossRef  Google Scholar 

  17. Li, L., Wang, B., Meroueh, S.O.: Support Vector Regression Scoring of Receptor-Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries. J. Chem. Inf. Model. 51, 2132–2138 (2011)

    CrossRef  Google Scholar 

  18. Durrant, J.D., McCammon, J.A.: NNScore 2.0: A Neural-Network Receptor–Ligand Scoring Function. J. Chem. Inf. Model. 51(11), 2897–2903 (2011)

    CrossRef  Google Scholar 

  19. Breiman, L.: Random Forests. Mach. Learn. 45, 5–32 (2001)

    CrossRef  MATH  Google Scholar 

  20. Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)

    MATH  Google Scholar 

  21. Amini, A., et al.: A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming. Proteins 69, 823–831 (2007)

    CrossRef  Google Scholar 

  22. Breiman, L., et al.: Classification and regression trees. Chapman & Hall/CRC (1984)

    Google Scholar 

  23. Cheng, T., Li, X., Li, Y., Liu, Z., Wang, R.: Comparative Assessment of Scoring Functions on a Diverse Test Set. J. Chem. Inf. Model. 49, 1079–1093 (2009)

    CrossRef  Google Scholar 

  24. Rucker, C., Rucker, G., Meringer, M.: y-Randomization and its variants in QSPR/QSAR. J. Chem. Inf. Model. 47, 2345–2357 (2007)

    CrossRef  Google Scholar 

  25. The Comprehensive R Archive Network (CRAN) Package e1071, http://cran.r-project.org/web/packages/e1071/index.html (last accessed November 2, 2011).

  26. Sotriffer, C.A., Sanschagrin, P., Matter, H., Klebe, G.: SFCscore: scoring functions for affinity prediction of protein-ligand complexes. Proteins 73, 395–419 (2008)

    CrossRef  Google Scholar 

  27. Zsoldos, Z., Reid, D., Simon, A., Sadjad, S.B., Johnson, A.P.: eHiTS: a new fast, exhaustive flexible ligand docking system. J. Mol. Graph. Model. 26, 198–212 (2007)

    CrossRef  Google Scholar 

  28. Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1999)

    Google Scholar 

  29. Kirkpatrick, S.C., Gelatt, D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)

    CrossRef  MathSciNet  MATH  Google Scholar 

  30. LIBSVM - A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ (last accessed November 2, 2011).

  31. CSAR, http://www.csardock.org (last accessed November 2, 2011).

  32. The PDBbind database, http://www.pdbbind-cn.org/ (last accessed November 2, 2011).

  33. Berman, H.M., et al.: The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)

    CrossRef  Google Scholar 

  34. The Comprehensive R Archive Network (CRAN) Package caret, http://cran.r-project.org/web/packages/caret/index.html (last accessed November 2, 2011).

Download references

Author information

Authors and Affiliations

  1. European Bioinformatics Institute, Cambridge, UK

    Pedro J. Ballester

Authors
  1. Pedro J. Ballester
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Institute of Medical Science, University of Tokyo, 4-6-1, Shirokanedai, 108-8639, Minato-ku, Tokyo, Japan

    Tetsuo Shibuya

  2. Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, 113-8654, Bunkyo-ku, Tokyo, Japan

    Hisashi Kashima

  3. Department of Comouter Science, Tokyo Institute of Technology, 2-12-1 Ookayamama, 152-8550, Meguro-ku, Tokyo, Japan

    Jun Sese

  4. Bioinformatics Project, National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, 567-0085, Suita, Osaka, Japan

    Shandar Ahmad

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ballester, P.J. (2012). Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression. In: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2012. Lecture Notes in Computer Science(), vol 7632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34123-6_2

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-34123-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34122-9

  • Online ISBN: 978-3-642-34123-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • The International Association for Pattern Recognition

    Published in cooperation with

    http://www.iapr.org/

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature