Abstract
The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 18 tasks and top 3 in 21 tasks. The trained machine learning models are integrated in ADMETboost, a web server that is publicly available at https://ai-druglab.smu.edu/admet.
Similar content being viewed by others
References
Bergstra J, Yamins D, Cox D (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Int Conf Mach Learn. PMLR, pp 115–123
Chen C, Zhang Q, Yu B, et al. (2020) Improving protein-protein interactions prediction accuracy using xgboost feature selection and stacked ensemble classifier. Comput Biol Med 123:103,899
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Cheng F, Li W, Zhou Y et al (2012) admetsar: a comprehensive source and free tool for assessment of chemical admet properties. J Chem Inf Model 52(11):3099–3105
Deng D, Chen X, Zhang R et al (2021) Xgraphboost: extracting graph neural network-based features for a better prediction of molecular properties. J Chem Inf Model 61(6):2697–2705
Dong J, Wang NN, Yao ZJ, et al. (2018) Admetlab: a platform for systematic admet evaluation based on a comprehensively collected admet database. J Cheminf 10(1):1–11
Durant JL, Leland BA, Henry DR et al (2002) Reoptimization of mdl keys for use in drug discovery. J Chem Inf Comput Sci 42(6):1273–1280
Göller AH, Kuhnke L, Montanari F, et al. (2020) Bayer’s in silico admet platform: a journey of machine learning over the past two decades. Drug Discov Today 25(9):1702–1709
Hu W, Liu B, Gomes J et al (2019) Strategies for pre-training graph neural networks. arXiv:1905.12265
Huang K, Fu T, Glass LM et al (2020) Deeppurpose: a deep learning library for drug–target interaction prediction. Bioinformatics 36(22-23):5545–5547
Huang K, Fu T, Gao W et al (2021) Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. In: Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks
Jaeger S, Fulle S, Turk S (2018) Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 58(1):27–35
Kennedy T (1997) Managing the drug discovery/development interface. Drug Discovery Today 2(10):436–444
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discovery 3(8):711–716
Lee W H, Millman S, Desai N et al (2021) Neuralfp: out-of-distribution detection using fingerprints of neural networks. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 9561–9568
Honorio MK, Moda LT, Andricopulo DA (2013) Pharmacokinetic properties and in silico adme modeling in drug discovery. Med Chem 9(2):163–176
Moriwaki H, Tian YS, Kawashita N et al (2018) Mordred: a molecular descriptor calculator. J Cheminf 10(1):1–14
O’Boyle NM, Banck M, James CA et al (2011) Open babel: an open chemical toolbox. J Cheminf 3(1):1–14
Ramsundar B, Eastman P, Walters P et al (2019) Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O’Reilly Media
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754
Schyman P, Liu R, Desai V et al (2017) vnn web server for admet predictions. Front Pharmacol 8:889
Song Z, Zhou H, Tian H et al (2020) Unraveling the energetic significance of chemical events in enzyme catalysis via machine-learning based regression approach. Commun Chem 3(1):1–10
Tian H, Trozzi F, Zoltowski BD et al (2020) Deciphering the allosteric process of the phaeodactylum tricornutum aureochrome 1a lov domain. J Phys Chem B 124(41):8960–8972
Tian H, Jiang X, Tao P (2021a) Passer: prediction of allosteric sites server. Mach Learn: Sci Technol 2(3):035,015
Tian H, Jiang X, Trozzi F et al (2021b) Explore protein conformational space with variational autoencoder. Front Mol Biosci 8:781,635
Venkatraman V (2021) Fp-admet: a compendium of fingerprint-based admet prediction models. J Cheminf 13(1):1–12
Waring M J, Arrowsmith J, Leach AR et al (2015) An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat Rev Drug Discovery 14(7):475–486
Xiong G, Wu Z, Yi J et al (2021) Admetlab 2.0: an integrated online platform for accurate and comprehensive predictions of admet properties. Nucleic Acids Res 49(W1):W5–W14
Xiong Z, Wang D, Liu X et al (2019) Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 63(16):8749–8760
Yang H, Lou C, Sun L et al (2019) admetsar 2.0: web-service for prediction and optimization of chemical admet properties. Bioinformatics 35(6):1067–1069
Zhang Q, Heldermon CD, Toler-Franklin C (2020) Multiscale detection of cancerous tissue in high resolution slide scans. In: Int Symp Vis Comput. Springer, pp 139–153
Acknowledgements
Computational time was generously provided by Southern Methodist University’s Center for Research Computing. The preprint version of this work is available on arXiv with DOI number 2204.07532 under CC BY-NC-ND 4.0 license.
Funding
Research reported in this paper was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award No. R15GM122013.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Author contribution
HT and RK conducted the experiment. HT plotted figures. All authors revised the manuscript.
Availability of data and materials
The data used in this study is publicly available in TDC ADMET benchmark group https://tdcommons.ai/benchmark/admet_group/overview/. The dataset can be downloaded through the TDC Python package (v0.3.6). The default training and testing data were used for model training. We shared the related codes, model parameters for each task, and the ready-to-use featurization results on GitHub at https://github.com/smu-tao-group/ADMET_XGBoost. The web server can be accessed at https://ai-druglab.smu.edu/admet.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Hao Tian and Rajas Ketkar contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tian, H., Ketkar, R. & Tao, P. ADMETboost: a web server for accurate ADMET prediction. J Mol Model 28, 408 (2022). https://doi.org/10.1007/s00894-022-05373-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00894-022-05373-8