Practical Chemoinformatics

pp 133-194


Machine Learning Methods in Chemoinformatics for Drug Discovery

  • Muthukumarasamy KarthikeyanAffiliated withDigital Information Resource Centre, National Chemical Laboratory Email author 
  • , Renu VyasAffiliated withScientist (DST) Division of Chemical Engineering and Process Development, National Chemical Laboratory

* Final gross prices may vary according to local VAT.

Get Access


It is well known that the structure of a molecule is responsible for its biological activity or physicochemical property. Here, we describe the role of machine learning (ML)/statistical methods for building reliable, predictive models in chemoinformatics. The ML methods are broadly divided into clustering, classification and regression techniques. However, the statistical/mathematical techniques which are part of the ML tools, such as artificial neural networks, hidden Markov models, support vector machine, decision tree learning, Random Forest and Naive Bayes and belief networks, are best suited for drug discovery and play an important role in lead identification and lead optimization steps. This chapter provides stepwise procedures for building ML-based classification and regression models using state-of-art open-source and proprietary tools. A few case studies using benchmark data sets have been carried out to demonstrate the efficacy of the ML-based classification for drug designing.


Machine learning Neural networks SVM SVR Genetic programming Chemoinformatics Drug design