Skip to main content
Log in

Pharmacophore features for machine learning in pharmaceutical virtual screening

  • Original Article
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

Methods of three-dimensional molecular alignment generally treat all pharmacophore features equally when superimposing. However, some pharmacophore features can be more important in a specific system. In this work, we derived the overlap volume of pharmacophore features from a molecular alignment approach as new features of molecules to build machine learning models. Features can be assigned weights to indicate their importance. With validation on DUD-E collection, models based on pharmacophore features represented by the overlap volume yielded significant performances with median AUC of approximately 0.98 and recall rate of almost 0.8.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Ballester PJ, Richards WG (2007) Ultrafast shape recognition to search compound databases for similar molecular shapes. J Comput Chem 28:1711–1723. https://doi.org/10.1002/jcc.20681

    Article  CAS  PubMed  Google Scholar 

  2. Mavridis L, Hudson BD, Ritchie DW (2007) Toward high throughput 3D virtual screening using spherical harmonic surface representations. J Chem Inf Model 47:1787–1796. https://doi.org/10.1021/ci7001507

    Article  CAS  PubMed  Google Scholar 

  3. Nicholls A, McGaughey GB, Sheridan RP, Good AC, Warren G, Mathieu M, Muchmore SW, Brown SP, Grant JA, Haigh JA, Nevins N, Jain AN, Kelley B (2010) Molecular shape and medicinal chemistry: a perspective. J Med Chem 53:3862–3886. https://doi.org/10.1021/jm900818s

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Vainio MJ, Puranen JS, Johnson MS (2009) ShaEP: molecular overlay based on shape and electrostatic potential. J Chem Inf Model 49:492–502. https://doi.org/10.1021/ci800315d

    Article  CAS  PubMed  Google Scholar 

  5. Liu X, Jiang H, Li H (2011) SHAFTS: a hybrid approach for 3D molecular similarity calculation. 1. Method and assessment of virtual screening. J Chem Inf Model 51:2372–2385. https://doi.org/10.1021/ci200060s

    Article  CAS  PubMed  Google Scholar 

  6. Hawkins PC, Skillman AG, Nicholls A (2007) Comparison of shape-matching and docking as virtual screening tools. J Med Chem 50:74–82. https://doi.org/10.1021/jm0603365

    Article  CAS  PubMed  Google Scholar 

  7. Yan X, Li J, Liu Z, Zheng M, Ge H, Xu J (2013) Enhancing molecular shape comparison by weighted Gaussian functions. J Chem Inf Model 53:1967–1978. https://doi.org/10.1021/ci300601q

    Article  CAS  PubMed  Google Scholar 

  8. Grant JA, Gallardo MA, Pickup BT (1996) A fast method of molecular shape comparison: a simple application of a Gaussian description of molecular shape. J Comput Chem 17:1653–1666. https://doi.org/10.1002/(SICI)1096-987X(19961115)17:14%3c1653:AID-JCC7%3e3.0.CO;2-K

    Article  CAS  Google Scholar 

  9. Güner OF (2000) Pharmacophore perception, development, and use in drug design. International University Line, La Jolla

    Google Scholar 

  10. Kearnes S, Pande V (2016) ROCS-derived features for virtual screening. J Comput Aided Mol Des 30:609–617. https://doi.org/10.1007/s10822-016-9959-3

    Article  CAS  PubMed  Google Scholar 

  11. James LM, Edmund KB, Jonathan DH (2009) Machine learning in virtual screening. Comb Chem High Throughput Screen 12:332–343. https://doi.org/10.2174/138620709788167980

    Article  Google Scholar 

  12. Eckert H, Bajorath J (2007) Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today 12:225–233. https://doi.org/10.1016/j.drudis.2007.01.011

    Article  CAS  PubMed  Google Scholar 

  13. Jorissen RN, Gilson MK (2005) Virtual screening of molecular databases using a support vector machine. J Chem Inf Model 45:549–561. https://doi.org/10.1021/ci049641u

    Article  CAS  PubMed  Google Scholar 

  14. Heikamp K, Bajorath J (2014) Support vector machines for drug discovery. Expert Opin Drug Discov 9:93–104. https://doi.org/10.1517/17460441.2014.866943

    Article  CAS  PubMed  Google Scholar 

  15. Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  16. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Paper presented at the proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, USA, August 13–17

  17. Breiman L (2017) Classification and regression trees. Routledge, London

    Book  Google Scholar 

  18. Mysinger MM, Carchia M, Irwin JJ, Shoichet BK (2012) Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55:6582–6594. https://doi.org/10.1021/jm300687e

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mason J, Good A, Martin EJ (2001) 3-D pharmacophores in drug discovery. Curr Pharm Des 7:567–597. https://doi.org/10.2174/1381612013397843

    Article  CAS  PubMed  Google Scholar 

  20. Li J, Ehlers T, Sutter J, Varma-O’brien S, Kirchmair J (2007) CAESAR: a new conformer generation algorithm based on recursive buildup and local rotational symmetry consideration. J Chem Inf Model 47:1923–1932. https://doi.org/10.1021/ci700136x

    Article  CAS  PubMed  Google Scholar 

  21. Inc AS (2012) Discovery studio modeling environment, release 3.5. Accelrys Discovery Studio Accelrys Software Inc, San Diego

  22. Max K (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26. https://doi.org/10.18637/jss.v028.i05

    Article  Google Scholar 

  23. Team RC (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

  24. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22. https://doi.org/10.1016/j.jspi.2009.07.020

    Article  PubMed  PubMed Central  Google Scholar 

  25. Karatzoglou A, Smola A, Hornik K (2004) kernlab—an S4 package for kernel methods in R. J Stat Softw 69:721–729. https://doi.org/10.18637/jss.v011.i09

    Article  Google Scholar 

  26. Chen T, He T, Benesty M, Khotilovich V, Tang Y (2016) Xgboost: extreme gradient boosting. R package version 0.71.2

Download references

Funding

This research was funded by the Taishan Scholar Program of Shandong Province (tsqn201812159) and the Foundation of Clinical Pharmacy of Chinese Medical Association (LCYX-M008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei Jiang.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xiaojing Wang and Wenxiu Han are co-first authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Han, W., Yan, X. et al. Pharmacophore features for machine learning in pharmaceutical virtual screening. Mol Divers 24, 407–412 (2020). https://doi.org/10.1007/s11030-019-09961-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-019-09961-4

Keywords

Navigation