On the ability of machine learning methods to discover novel scaffolds

Jagdev, Rishi; Madsen, Thomas Bruun; Finn, Paul W.

doi:10.1007/s00894-022-05359-6

On the ability of machine learning methods to discover novel scaffolds

Original Paper
Published: 27 December 2022

Volume 29, article number 22, (2023)
Cite this article

Journal of Molecular Modeling Aims and scope Submit manuscript

Rishi Jagdev¹,
Thomas Bruun Madsen² &
Paul W. Finn^1,3

491 Accesses
2 Altmetric
Explore all metrics

Abstract

The recent advances in the application of machine learning to drug discovery have made it a ‘hot topic’ for research, with hundreds of academic groups and companies integrating machine learning into their drug discovery projects. Nevertheless, there remains great uncertainty regarding the most appropriate ways to evaluate the relative performance of these powerful methods against more traditional cheminformatics approaches, and many pitfalls remain for the unwary. In 2020, researchers at MIT (Stokes et al., Cell 180(4), 688–702, 2020) reported the discovery of a new compound with antibacterial activity, halicin, through the use of a neural network machine learning method. A robust ability to identify new active chemotypes through computational methods would be very useful. In this study, we have used the Stokes et al. dataset to compare the performance of this method to two other approaches, Mapping of Activity Through Dichotomic Scores (MADS) by Todeschini et al. (J Chemom 32(4):e2994, 2018) and Random Matrix Theory (RMT) by Lee et al. (Proc Natl Acad Sci 116(9):3373–3378, 2019). Our results demonstrate that all three methods are capable of predicting halicin as an active antibacterial compound, but that this result is dependent on the dataset composition, pre-processing and the molecular fingerprint used. We have further assessed overall performance as determined by several performance metrics. We also investigated the scaffold hopping potential of the methods by modifying the dataset by removal of the β-lactam and fluoroquinolone chemotypes. MADS and RMT are able to identify actives in the test set that contained these substructures. This ability arises because of high scoring fragments of the withheld chemotypes that are in common with other active antibiotic classes. Interestingly, MADS is relatively better compared to the other two methods based on general predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emergence of Drug Discovery in Machine Learning

The Contribution of Artificial Intelligence to Drug Discovery: Current Progress and Prospects for the Future

Computer-Aided Drug Discovery and Design: Recent Advances and Future Prospects

References

Yanling J, Xin L, Zhiyuan L (2013) The antibacterial drug discovery. Drug Discovery, pp 289–307
Aminov RI (2010) A brief history of the antibiotic era: lessons learned and challenges for the future. Front Microbiol 1:134
Article Google Scholar
Laxminarayan R, Duse A, Wattal C et al (2013) Antibiotic resistance—the need for global solutions. Lancet Infect Dis 13(12):1057–1098
Article Google Scholar
Goh GB, Hodas NO, Vishnu A (2017) Deep learning for computational chemistry. J Comput Chem 38(16):1291–1307
Article CAS Google Scholar
Scarselli F, Gori M, Tsoi AC et al (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Article Google Scholar
Baskin II, Winkler D, Tetko IV (2016) A renaissance of neural networks in drug discovery. Expert Opin Drug Discov 11(8):785–795
Article CAS Google Scholar
Salt DW, Yildiz N, Livingstone DJ et al (1992) The use of artificial neural networks in qsar. Pestic Sci 36(2):161–170
Article CAS Google Scholar
Ghasemi F, Mehridehnavi A, Perez-Garrido A et al (2018) Neural network and deep-learning algorithms used in qsar studies: merits and drawbacks. Drug Discov Today 23(10):1784–1790
Article CAS Google Scholar
Staszak M, Staszak K, Wieszczycka K et al (2021) Machine learning in drug design: use of artificial intelligence to explore the chemical structure–biological activity relationship. Wiley Interdisciplinary Reviews: Computational Molecular Science, pp e1568
Mayr A, Klambauer G, Unterthiner T et al (2018) Large-scale comparison of machine learning methods for drug target prediction on chembl. Chem Sci 9(24):5441–5451
Article CAS Google Scholar
Lenselink EB, Ten Dijke N, Bongers B et al (2017) Beyond the hype: deep neural networks outperform established methods using a chembl bioactivity benchmark set. J Cheminformatics 9(1):1–14
Article Google Scholar
Gaulton A, Bellis LJ, Bento AP et al (2012) Chembl: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(D1):D1100–D1107
Article CAS Google Scholar
Truchon JF, Bayly CI (2007) Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. J Chem Inform Model 47(2):488–508
Article CAS Google Scholar
Koutsoukas A, Monaghan KJ, Li X et al (2017) Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminformatics 9(1):1–13
Article Google Scholar
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J et al (2015) Convolutional networks on graphs for learning molecular fingerprints. arXiv:150909292
Withnall M, Lindelöf E, Engkvist O et al (2020) Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction. J Cheminformatics 12(1):1–18
Article CAS Google Scholar
Jiang D, Wu Z, Hsieh CY et al (2021) Could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models. J Cheminformatics 13(1):1–23
Article Google Scholar
Robinson MC, Glen RC et al (2020) Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction. Journal of computer-aided molecular design, pp 1–14
Pérez-Sianes J, Pérez-Sánchez H, Díaz F (2016) Virtual screening: a challenge for deep learning. In: International Conference on Practical Applications of Computational Biology & Bioinformatics. Springer, pp 13–22
Bajorath J (2017) Computational scaffold hopping: cornerstone for the future of drug design?
Schneider G, Neidhart W, Giller T et al (1999) “scaffold-hopping” by topological pharmacophore search: a contribution to virtual screening. Angew Chem Int Ed 38(19):2894–2896
Article CAS Google Scholar
Vainio MJ, Kogej T, Raubacher F et al (2013) Scaffold hopping by fragment replacement
Saluste G, Albarran MI, Alvarez RM et al (2012) Fragment-hopping-based discovery of a novel chemical series of proto-oncogene pim-1 kinase inhibitors. PloS One 7(10):e45,964
Article CAS Google Scholar
Ertl P (2012) Database of bioactive ring systems with calculated properties and its use in bioisosteric design and scaffold hopping. Bioorg Med Chem 20(18):5436–5442
Article CAS Google Scholar
Stokes JM, Yang K, Swanson K et al (2020) A deep learning approach to antibiotic discovery. Cell 180(4):688–702
Article CAS Google Scholar
Todeschini R, Consonni V, Ballabio D et al (2018) Mapping of activity through dichotomic scores (mads): a new chemoinformatic approach to detect activity-rich structural regions. J Chemom 32(4):e2994
Article Google Scholar
Lee AA, Yang Q, Bassyouni A et al (2019) Ligand biological activity predicted by cleaning positive and negative chemical correlations. Proc Natl Acad Sci 116(9):3373–3378
Article CAS Google Scholar
Inc CCG (2019) Molecular operating environment (moe)
Corsello SM, Bittker JA, Liu Z et al (2017) The drug repurposing hub: a next-generation drug library and information resource. Nat Med 23(4):405–408
Article CAS Google Scholar
Cereto-Massagué A, Ojeda MJ, Valls C et al (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63
Article Google Scholar
Willett P (2006) Similarity-based virtual screening using 2d fingerprints. Drug Discov Today 11 (23-24):1046–1053
Article CAS Google Scholar
Muegge I, Mukherjee P (2016) An overview of molecular fingerprint similarity search in virtual screening. Expert Opin Drug Discov 11(2):137–148
Article CAS Google Scholar
Riniker S, Landrum GA (2013) Open-source platform to benchmark fingerprints for ligand-based virtual screening. J Cheminformatics 5(1):1–17
Article Google Scholar
Wale N, Watson IA, Karypis G (2008) Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inf Syst 14(3):347–375
Article Google Scholar
Russo DP, Zorn KM, Clark AM et al (2018) Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction. Mol Pharm 15(10):4361–4370
Article CAS Google Scholar
Kensert A, Alvarsson J, Norinder U et al (2018) Evaluating parameters for ligand-based modeling with random forest on sparse data sets. J Cheminformatics 10(1):1–10
Article Google Scholar
Chen B, Harrison RF, Papadatos G et al (2007) Evaluation of machine-learning methods for ligand-based virtual screening. J Comput Aided Mol Des 21(1):53–62
Article Google Scholar
(1984) Maccs keys, mdl information systems. Inc: San Leandro, CA
Nilakantan R, Bauman N, Dixon JS et al (1987) Topological torsion: a new molecular descriptor for sar applications. comparison with other descriptors. J Chem Inf Comput Sci 27(2):82– 85
Article CAS Google Scholar
Landrum G (2013) Rdkit documentation. Release 1(1-79):4
Google Scholar
Lee AA, Brenner MP, Colwell LJ (2016) Predicting protein–ligand affinity with a random matrix framework. Proc Natl Acad Sci 113:13,564–13,569
Article CAS Google Scholar
Bajusz D, Rácz A, Héberger K (2015) Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminformatics 7(1):1–13
Article CAS Google Scholar
Hussin SK, Abdelmageid SM, Alkhalil A et al (2021) Handling imbalance classification virtual screening big data using machine learning algorithms. Complexity 2021
Branco P, Torgo L, Ribeiro RP (2017) Relevance-based evaluation metrics for multi-class imbalanced domains. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 698–710
Ballabio D, Grisoni F, Todeschini R (2018) Multivariate comparison of classification performance measures. Chemometr Intell Lab Syst 174:33–44
Article CAS Google Scholar
Schubert S, Dalhoff A (2012) Activity of moxifloxacin, imipenem, and ertapenem against Escherichia coli, enterobacter cloacae, enterococcus faecalis, and bacteroides fragilis in monocultures and mixed cultures in an in vitro pharmacokinetic/pharmacodynamic model simulating concentrations in the human pancreas. Antimicrob Agents Chemother 56(12):6434–6436
Article CAS Google Scholar
Marie MAM, Krishnappa LG, Lory S (2016) In vitro activity and the efficacy of arbekacin, cefminox, fosfomycin, biapenem against gram-negative organisms: new treatment options?. Proceedings of the National Academy of Sciences, India Section B: Biological Sciences 86(3):749–755
Article CAS Google Scholar
Goto S, Sakamoto H, Ogawa M et al (1982) Bactericidal activity of cefazolin, cefoxitin, and cefmetazole against Escherichia coli and klebsiella pneumoniae. Chemotherapy 28(1):18–25
Article CAS Google Scholar
Russell DG (2001) Mycobacterium tuberculosis: here today, and here tomorrow. Nat Rev Mol Cell Biol 2(8):569–578
Article CAS Google Scholar
Brenner DJ, Farmer IIIJ (2015) Enterobacteriaceae. Bergey’s manual of systematics of archaea and bacteria, pp 1–24
Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and qsar modeling research. J Chem Inf Model 50(7):1189
Article CAS Google Scholar
Williams AJ, Ekins S, Tkachenko V (2012) Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov Today 17(13-14):685–701
Article CAS Google Scholar
Richter MF, Drown BS, Riley AP et al (2017) Predictive compound accumulation rules yield a broad-spectrum antibiotic. Nature 545(7654):299–304
Article CAS Google Scholar
Ebejer JP, Charlton MH, Finn PW (2016) Are the physicochemical properties of antibacterial compounds really different from other drugs? J Cheminformatics 8(1):1–9
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Jean Paul Ebejer, University of Malta, for his valuable suggestions to improve the manuscript.

Author information

Authors and Affiliations

School of Computer Science, University of Buckingham, Hunter Street, Buckingham, MK18 1EG, UK
Rishi Jagdev & Paul W. Finn
University of West London, St Mary’s Road, Ealing, London, W5 5RF, UK
Thomas Bruun Madsen
Oxford Drug Design Ltd., Oxford Centre for Innovation, New Road, Oxford, OX1 1BY, UK
Paul W. Finn

Authors

Rishi Jagdev
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Bruun Madsen
View author publications
You can also search for this author in PubMed Google Scholar
Paul W. Finn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rishi Jagdev.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Conflict of interest

The authors declare no competing interests.

Additional information

Supplementary information

The online version contains supplementary material available at https://doi.org/10.1007/s00894-022-05359-6.

Author contribution

All authors contributed to the study conception and design. RJ did the experimental analysis and wrote the first draft of the main manuscript. TM contributed to the mathematical interpretation of all the ML methods and reviewed the manuscript. PF substantially contributed to the conception of the experiments, critically reviewed and revised the manuscript. All authors read and approved the final manuscript.

Data availability statement

The training and test datasets are available as supplementary information with the publication by Stokes et al. [25, Supplementary Tables S2A, S2B]. The codes for all three methods discussed in the paper are publicly available. The links to the codes are provided as follows : Chemprop [25] : https://github.com/swansonk14/chemprop Mapping of Activity through Dichotomic Scores (MADS) [26]: https://michem.unimib.it/download/matlab-toolboxes/virtual-screening-toolbox-for-matlab/https://michem.unimib.it/download/matlab-toolboxes/virtual-screening-toolbox-for-matlab/ Random Matrix theory (RMT) [27] : https://github.com/alphaleegroup/RandomMatrixDiscriminant

Institutional review board statement

Not applicable.

Informed consent statement

Not applicable.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Thomas Bruun Madsen and Paul W. Finn contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 367 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jagdev, R., Madsen, T.B. & Finn, P.W. On the ability of machine learning methods to discover novel scaffolds. J Mol Model 29, 22 (2023). https://doi.org/10.1007/s00894-022-05359-6

Download citation

Received: 12 August 2022
Accepted: 21 October 2022
Published: 27 December 2022
DOI: https://doi.org/10.1007/s00894-022-05359-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the ability of machine learning methods to discover novel scaffolds

Abstract

Access this article

Similar content being viewed by others

Emergence of Drug Discovery in Machine Learning

The Contribution of Artificial Intelligence to Drug Discovery: Current Progress and Prospects for the Future

Computer-Aided Drug Discovery and Design: Recent Advances and Future Prospects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Supplementary information

Author contribution

Data availability statement

Institutional review board statement

Informed consent statement

Publisher’s note

Electronic supplementary material

(PDF 367 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the ability of machine learning methods to discover novel scaffolds

Abstract

Access this article

Similar content being viewed by others

Emergence of Drug Discovery in Machine Learning

The Contribution of Artificial Intelligence to Drug Discovery: Current Progress and Prospects for the Future

Computer-Aided Drug Discovery and Design: Recent Advances and Future Prospects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Supplementary information

Author contribution

Data availability statement

Institutional review board statement

Informed consent statement

Publisher’s note

Electronic supplementary material

(PDF 367 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation