Skip to main content

A novel artificial intelligence protocol to investigate potential leads for diabetes mellitus

Abstract

Dipeptidyl peptidase-4 (DPP4) is highly participated in regulating diabetes mellitus (DM), and inhibitors of DPP4 may act as potential DM drugs. Therefore, we performed a novel artificial intelligence (AI) protocol to screen and validate the potential inhibitors from Traditional Chinese Medicine Database. The potent top 10 compounds were selected as candidates by Dock Score. In order to further screen the candidates, we used numbers of machine learning regression models containing support vector machines, bagging, random forest and other regression algorithms, as well as deep neural network models to predict the activity of the candidates. In addition, as a traditional method, 2D QSAR (multiple linear regression) and 3D QSAR methods are also applied. The AI methods got a better performance than the traditional 2D QSAR method. Moreover, we also built a framework composed of deep neural networks and transformer to predict the binding affinity of candidates and DPP4. Artificial intelligence methods and QSAR models illustrated the compound, 2007_4105, was a potent inhibitor. The 2007_4105 compound was finally validated by molecular dynamics simulations. Combining all the models and algorithms constructed and the results, Hypecoum leptocarpum might be a potential and effective medicine herb for the treatment of DM.

Graphic abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Abbreviations

DPP4:

Dipeptidyl peptidase-4

TCM:

Traditional Chinese Medicine

SVM:

Support vector machine

MLR:

Multiple linear regression

CoMFA:

Comparative force field analysis

CoMSIA:

Comparative similarity indices analysis

DL:

Deep learning

RF:

Random forests

MD:

Molecular dynamics

QSAR:

Quantitative structure–activity relationship

ML:

Machine learning

AI:

Artificial intelligence

DM:

Diabetes mellitus

IDF:

International Diabetes Federation

T2D:

Type 2 diabetes

IR:

Insulin resistance

DPP4i:

DPP4 inhibitor

CADD:

Computer-aided drug design

DS:

Discovery Studio software

CHARMm:

Chemistry at HARvard Molecular Mechanics

NHA:

Number of H-bond acceptors

NHD:

Number of H-bond donors

GIA:

Gastrointestinal absorption

BBP:

Brain–blood permeant

CYL:

CYP2C19 inhibiting level

MSE:

Mean square error

PCA:

Principal component analysis

AdaBoost:

Adapt Boost

MM2:

Molecular mechanics 2

PLS:

Partial least squares

CV:

Cross-validated

NV:

Non-cross-validation

LNCS:

Linear constraint solver

RMSD:

Root mean square deviation

RMSF:

Root mean square fluctuation

SASA:

Solvent accessible surface area

MSD:

Mean square deviation

ReLu:

Rectified linear units

Lasso:

Least absolute shrinkage and selection operator

References

  1. 1.

    Miller EJ, Brines CM (2018) Canine diabetes mellitus associated ocular disease. Top Companion Anim Med 33(1):29–34. https://doi.org/10.1053/j.tcam.2018.03.001

    Article  PubMed  Google Scholar 

  2. 2.

    Badescu SV, Tataru C, Kobylinska L, Georgescu EL, Zahiu DM, Zagrean AM, Zagrean L (2016) The association between diabetes mellitus and depression. J Med Life 9(2):120–125

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Wojciechowska J, Krajewski W, Bolanowski M, Krecicki T, Zatonski T (2016) Diabetes and cancer: a review of current knowledge. Exp Clin Endocrinol Diabetes 124(5):263–275. https://doi.org/10.1055/s-0042-10091010.1055/s-0042-100910

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Keshavarz K, Lotfi F, Sanati E, Salesi M, Hashemi-Meshkini A, Jafari M, Mojahedian MM, Najafi B, Nikfar S (2017) Linagliptin versus sitagliptin in patients with type 2 diabetes mellitus: a network meta-analysis of randomized clinical trials. DARU J Pharm Sci 25(1):23. https://doi.org/10.1186/s40199-017-0189-6

    CAS  Article  Google Scholar 

  5. 5.

    Ghorpade DS, Ozcan L, Zheng Z, Nicoloro SM, Shen Y, Chen E, Blüher M, Czech MP, Tabas I (2018) Hepatocyte-secreted DPP4 in obesity promotes adipose inflammation and insulin resistance. Nature 555:673. https://doi.org/10.1038/nature26138

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Maruthur NM, Tseng E, Hutfless S, Wilson LM, Suarez-Cuervo C, Berger Z, Chu Y, Iyoha E, Segal JB, Bolen S (2016) Diabetes medications as monotherapy or metformin-based combination therapy for type 2 diabetes: a systematic review and meta-analysis. Ann Intern Med 164(11):740–751. https://doi.org/10.7326/m15-2650

    Article  PubMed  Google Scholar 

  7. 7.

    Huang T, Shi XH, Wang P, He Z, Feng KY, Hu L, Kong X, Li YX, Cai YD, Chou KC (2010) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS ONE 5(6):e10972. https://doi.org/10.1371/journal.pone.0010972

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Norinder U, Naveja J, López-López E, Mucs D, Medina-Franco J (2019) Conformal prediction of HDAC inhibitors. SAR QSAR Environ Res 30(4):265–277

    CAS  Article  Google Scholar 

  9. 9.

    Yang X, Wang Y, Byrne R, Schneider G, Yang S (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev 119(18):10520–10594. https://doi.org/10.1021/acs.chemrev.8b00728

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Prieto-Martínez FD, López-López E, Juárez-Mercado KE, Medina-Franco JL (2019) Chapter 2 Computational drug design methods—current and future perspectives. In: Kunal Roy (ed) silico drug design, Academic press, Cambridge, pp 19–44. https://doi.org/10.1016/b978-0-12-816125-8.00002-x

  11. 11.

    López-López E, Bajorath J, Medina-Franco JL (2020) Informatics for chemistry, biology, and biomedical sciences. J Chem Inf Model 61:26–35. https://doi.org/10.1021/acs.jcim.0c01301

  12. 12.

    López-López E, Barrientos-Salcedo C, Prieto-Martínez FD, Medina-Franco JL (2020) Chapter Seven——in silico tools to study molecular targets of neglected diseases: inhibition of TcSir2rp3, an epigenetic enzyme of Trypanosoma cruzi. In: Karabencheva-Christova T, Christov C (eds) Advances in protein chemistry and structural biology, vol 122. Academic Press, pp 203–229. https://doi.org/10.1016/bs.apcsb.2020.04.001

    Chapter  Google Scholar 

  13. 13.

    Tsung-Ying T, Kai-Wei C, Yu-Chian Chen C (2011) iScreen: world’s first cloud-computing web server for virtual screening and de novo drug design based on TCM database@Taiwan. J Comput Aided Mol Des 25(6):525–531. https://doi.org/10.1007/s10822-011-9438-9

    CAS  Article  Google Scholar 

  14. 14.

    Chen CYC (2013) A novel integrated framework and improved methodology of computer-aided drug design. Curr Top Med Chem 13(9):965–988. https://doi.org/10.2174/1568026611313090002

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Daina A, Michielin O, Zoete V (2017) SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep (2045-2322 (Electronic)). https://doi.org/10.1038/srep42717

    Article  Google Scholar 

  16. 16.

    Price S, Flach PA (2017) Computational support for academic peer review: a perspective from artificial intelligence. Commun ACM 60(3):70–79. https://doi.org/10.1145/2979672

    Article  Google Scholar 

  17. 17.

    Alvarez-Machancoses O, Fernandez-Martinez JL (2019) Using artificial intelligence methods to speed up drug discovery. Expert Opin Drug Discov 14(8):769–777. https://doi.org/10.1080/17460441.2019.1621284

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436. https://doi.org/10.1038/nature14539

    CAS  Article  Google Scholar 

  19. 19.

    Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 4:725. https://doi.org/10.1038/nprot.2010.5

    CAS  Article  Google Scholar 

  20. 20.

    Yang J, Zhang Y (2015) I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Res 43:W174–W181. https://doi.org/10.1093/nar/gkv342

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Lovell SC, Davis IW, Arendall WB III, de Bakker PIW, Word JM, Prisant MG, Richardson JS, Richardson DC (2003) Structure validation by Cα geometry: ϕ, ψ and Cβ deviation. Proteins Struct Funct Bioinform 50(3):437–450. https://doi.org/10.1002/prot.10286

    CAS  Article  Google Scholar 

  22. 22.

    Bowie JU, Lüthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science (New York, NY) 253(5016):164–170. https://doi.org/10.1126/science.1853201

    CAS  Article  Google Scholar 

  23. 23.

    Lüthy R, Bowie JU, Eisenberg D (1992) Assessment of protein models with three-dimensional profiles. Nature 356(6364):83–85. https://doi.org/10.1038/356083a0

    Article  PubMed  Google Scholar 

  24. 24.

    Colovos C, Yeates TO (1993) Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 2(9):1511–1519. https://doi.org/10.1002/pro.5560020916

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2012) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 64(Supplement):4–17. https://doi.org/10.1016/j.addr.2012.09.019

    Article  Google Scholar 

  26. 26.

    Brooks BR, Brooks CL III, Mackerell AD Jr, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caflisch A, Caves L, Cui Q, Dinner AR, Feig M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J, Ovchinnikov V, Paci E, Pastor RW, Post CB, Pu JZ, Schaefer M, Tidor B, Venable RM, Woodcock HL, Wu X, Yang W, York DM, Karplus M (2009) CHARMM: the biomolecular simulation program. J Comput Chem 30:1545–1614. https://doi.org/10.1002/jcc.21287

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Xie H, Zeng L, Zeng S, Lu X, Zhao X, Zhang G, Tu Z, Xu H, Yang L, Zhang X, Wang S, Hu W (2013) Highly potent dipeptidyl peptidase IV inhibitors derived from Alogliptin through pharmacophore hybridization and lead optimization. Eur J Med Chem 68:312–320. https://doi.org/10.1016/j.ejmech.2013.08.010

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Wang L, Zhang B, Ji J, Li B, Yan J, Zhang W, Wu Y, Wang X (2009) Synthesis and evaluation of structurally constrained imidazolidin derivatives as potent dipeptidyl peptidase IV inhibitors. Eur J Med Chem 8:3318. https://doi.org/10.1016/j.ejmech.2009.03.021

    CAS  Article  Google Scholar 

  29. 29.

    Jun MA, Park WS, Kang SK, Kim KY, Kim KR, Rhee SD, Bae MA, Kang NS, Sohn SK, Kim SG (2008) Synthesis and biological evaluation of pyrazoline analogues with beta-amino acyl group as dipeptidyl peptidase IV inhibitors. Eur J Med Chem 9:1889. https://doi.org/10.1016/j.ejmech.2007.11.029

    CAS  Article  Google Scholar 

  30. 30.

    Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Tran Intell Syst Technol 2(3):1–27. https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  31. 31.

    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Paper presented at the International Conference for Learning Representations, San Diego

  32. 32.

    Landrum G (2006) RDKit: Open-source cheminformatics

  33. 33.

    Huang K, Fu T, Xiao C, Glass L, Sun J (2020) DeepPurpose: a deep learning based drug repurposing toolkit. arXiv preprint arXiv:200408919

  34. 34.

    Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindahl E (2015) GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2:19–25. https://doi.org/10.1016/j.softx.2015.06.001

    Article  Google Scholar 

  35. 35.

    Zoete V, Cuendet MA, Grosdidier A, Michielin O (2011) SwissParam: a fast force field generation tool for small organic molecules. J Comput Chem 32(11):2359–2368. https://doi.org/10.1002/jcc.21816

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Hess B, Bekker H, Berendsen HJC, Fraaije JGEM (1997) LINCS: a linear constraint solver for molecular simulations. J Comput Chem 18(12):1463–1472. https://doi.org/10.1002/(sici)1096-987x(199709)18:12%3c1463::Aid-jcc4%3e3.0.Co;2-h

    CAS  Article  Google Scholar 

  37. 37.

    Huang M-L, Hung Y-H, Lee WM, Li R-K, Jiang B-R (2014) SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. Sci World J 2014:795624. https://doi.org/10.1155/2014/795624

  38. 38.

    Zhang G-L, Rücker G, Breitmaier E, Mayer R (1995) Alkaloids from Hypecoum leptocarpum. Phytochemistry 40(6):1813–1816. https://doi.org/10.1016/0031-9422(95)00449-H

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Guangzhou science and technology fund (Grant No. 201803010072), Science, Technology and Innovation Commission of Shenzhen Municipality (JCYL 20170818165305521) and China Medical University Hospital (DMR-110-097). We also acknowledge the start-up funding from SYSU “Hundred Talent Program.”

Author information

Affiliations

Authors

Contributions

CY-CC designed research. J-NG, GC, XC and Z-DC worked together to complete the experiment and analyzed the data. CY-CC contributed to analytic tools. J-NG, GC, XC, Z-DC, LZ and CY-CC wrote the manuscript together.

Corresponding author

Correspondence to Calvin Yu-Chian Chen.

Ethics declarations

Conflicts of interest

The authors report no conflicts of interest in this work.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2661 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gong, JN., Zhao, L., Chen, G. et al. A novel artificial intelligence protocol to investigate potential leads for diabetes mellitus. Mol Divers 25, 1375–1393 (2021). https://doi.org/10.1007/s11030-021-10204-8

Download citation

Keywords

  • Dipeptidyl peptidase-4 (DPP4)
  • Quantitative structure–activity relationship (QSAR)
  • Machine learning (ML)
  • Artificial intelligence (AI)
  • Molecular dynamics simulation (MD)