Skip to main content

A novel artificial intelligence protocol to investigate potential leads for diabetes mellitus


Dipeptidyl peptidase-4 (DPP4) is highly participated in regulating diabetes mellitus (DM), and inhibitors of DPP4 may act as potential DM drugs. Therefore, we performed a novel artificial intelligence (AI) protocol to screen and validate the potential inhibitors from Traditional Chinese Medicine Database. The potent top 10 compounds were selected as candidates by Dock Score. In order to further screen the candidates, we used numbers of machine learning regression models containing support vector machines, bagging, random forest and other regression algorithms, as well as deep neural network models to predict the activity of the candidates. In addition, as a traditional method, 2D QSAR (multiple linear regression) and 3D QSAR methods are also applied. The AI methods got a better performance than the traditional 2D QSAR method. Moreover, we also built a framework composed of deep neural networks and transformer to predict the binding affinity of candidates and DPP4. Artificial intelligence methods and QSAR models illustrated the compound, 2007_4105, was a potent inhibitor. The 2007_4105 compound was finally validated by molecular dynamics simulations. Combining all the models and algorithms constructed and the results, Hypecoum leptocarpum might be a potential and effective medicine herb for the treatment of DM.

Graphic abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13



Dipeptidyl peptidase-4


Traditional Chinese Medicine


Support vector machine


Multiple linear regression


Comparative force field analysis


Comparative similarity indices analysis


Deep learning


Random forests


Molecular dynamics


Quantitative structure–activity relationship


Machine learning


Artificial intelligence


Diabetes mellitus


International Diabetes Federation


Type 2 diabetes


Insulin resistance


DPP4 inhibitor


Computer-aided drug design


Discovery Studio software


Chemistry at HARvard Molecular Mechanics


Number of H-bond acceptors


Number of H-bond donors


Gastrointestinal absorption


Brain–blood permeant


CYP2C19 inhibiting level


Mean square error


Principal component analysis


Adapt Boost


Molecular mechanics 2


Partial least squares






Linear constraint solver


Root mean square deviation


Root mean square fluctuation


Solvent accessible surface area


Mean square deviation


Rectified linear units


Least absolute shrinkage and selection operator


  1. 1.

    Miller EJ, Brines CM (2018) Canine diabetes mellitus associated ocular disease. Top Companion Anim Med 33(1):29–34.

    Article  PubMed  Google Scholar 

  2. 2.

    Badescu SV, Tataru C, Kobylinska L, Georgescu EL, Zahiu DM, Zagrean AM, Zagrean L (2016) The association between diabetes mellitus and depression. J Med Life 9(2):120–125

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Wojciechowska J, Krajewski W, Bolanowski M, Krecicki T, Zatonski T (2016) Diabetes and cancer: a review of current knowledge. Exp Clin Endocrinol Diabetes 124(5):263–275.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Keshavarz K, Lotfi F, Sanati E, Salesi M, Hashemi-Meshkini A, Jafari M, Mojahedian MM, Najafi B, Nikfar S (2017) Linagliptin versus sitagliptin in patients with type 2 diabetes mellitus: a network meta-analysis of randomized clinical trials. DARU J Pharm Sci 25(1):23.

    CAS  Article  Google Scholar 

  5. 5.

    Ghorpade DS, Ozcan L, Zheng Z, Nicoloro SM, Shen Y, Chen E, Blüher M, Czech MP, Tabas I (2018) Hepatocyte-secreted DPP4 in obesity promotes adipose inflammation and insulin resistance. Nature 555:673.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Maruthur NM, Tseng E, Hutfless S, Wilson LM, Suarez-Cuervo C, Berger Z, Chu Y, Iyoha E, Segal JB, Bolen S (2016) Diabetes medications as monotherapy or metformin-based combination therapy for type 2 diabetes: a systematic review and meta-analysis. Ann Intern Med 164(11):740–751.

    Article  PubMed  Google Scholar 

  7. 7.

    Huang T, Shi XH, Wang P, He Z, Feng KY, Hu L, Kong X, Li YX, Cai YD, Chou KC (2010) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS ONE 5(6):e10972.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Norinder U, Naveja J, López-López E, Mucs D, Medina-Franco J (2019) Conformal prediction of HDAC inhibitors. SAR QSAR Environ Res 30(4):265–277

    CAS  Article  Google Scholar 

  9. 9.

    Yang X, Wang Y, Byrne R, Schneider G, Yang S (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev 119(18):10520–10594.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Prieto-Martínez FD, López-López E, Juárez-Mercado KE, Medina-Franco JL (2019) Chapter 2 Computational drug design methods—current and future perspectives. In: Kunal Roy (ed) silico drug design, Academic press, Cambridge, pp 19–44.

  11. 11.

    López-López E, Bajorath J, Medina-Franco JL (2020) Informatics for chemistry, biology, and biomedical sciences. J Chem Inf Model 61:26–35.

  12. 12.

    López-López E, Barrientos-Salcedo C, Prieto-Martínez FD, Medina-Franco JL (2020) Chapter Seven——in silico tools to study molecular targets of neglected diseases: inhibition of TcSir2rp3, an epigenetic enzyme of Trypanosoma cruzi. In: Karabencheva-Christova T, Christov C (eds) Advances in protein chemistry and structural biology, vol 122. Academic Press, pp 203–229.

    Chapter  Google Scholar 

  13. 13.

    Tsung-Ying T, Kai-Wei C, Yu-Chian Chen C (2011) iScreen: world’s first cloud-computing web server for virtual screening and de novo drug design based on TCM database@Taiwan. J Comput Aided Mol Des 25(6):525–531.

    CAS  Article  Google Scholar 

  14. 14.

    Chen CYC (2013) A novel integrated framework and improved methodology of computer-aided drug design. Curr Top Med Chem 13(9):965–988.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Daina A, Michielin O, Zoete V (2017) SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep (2045-2322 (Electronic)).

    Article  Google Scholar 

  16. 16.

    Price S, Flach PA (2017) Computational support for academic peer review: a perspective from artificial intelligence. Commun ACM 60(3):70–79.

    Article  Google Scholar 

  17. 17.

    Alvarez-Machancoses O, Fernandez-Martinez JL (2019) Using artificial intelligence methods to speed up drug discovery. Expert Opin Drug Discov 14(8):769–777.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436.

    CAS  Article  Google Scholar 

  19. 19.

    Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 4:725.

    CAS  Article  Google Scholar 

  20. 20.

    Yang J, Zhang Y (2015) I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Res 43:W174–W181.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Lovell SC, Davis IW, Arendall WB III, de Bakker PIW, Word JM, Prisant MG, Richardson JS, Richardson DC (2003) Structure validation by Cα geometry: ϕ, ψ and Cβ deviation. Proteins Struct Funct Bioinform 50(3):437–450.

    CAS  Article  Google Scholar 

  22. 22.

    Bowie JU, Lüthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science (New York, NY) 253(5016):164–170.

    CAS  Article  Google Scholar 

  23. 23.

    Lüthy R, Bowie JU, Eisenberg D (1992) Assessment of protein models with three-dimensional profiles. Nature 356(6364):83–85.

    Article  PubMed  Google Scholar 

  24. 24.

    Colovos C, Yeates TO (1993) Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 2(9):1511–1519.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2012) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 64(Supplement):4–17.

    Article  Google Scholar 

  26. 26.

    Brooks BR, Brooks CL III, Mackerell AD Jr, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caflisch A, Caves L, Cui Q, Dinner AR, Feig M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J, Ovchinnikov V, Paci E, Pastor RW, Post CB, Pu JZ, Schaefer M, Tidor B, Venable RM, Woodcock HL, Wu X, Yang W, York DM, Karplus M (2009) CHARMM: the biomolecular simulation program. J Comput Chem 30:1545–1614.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Xie H, Zeng L, Zeng S, Lu X, Zhao X, Zhang G, Tu Z, Xu H, Yang L, Zhang X, Wang S, Hu W (2013) Highly potent dipeptidyl peptidase IV inhibitors derived from Alogliptin through pharmacophore hybridization and lead optimization. Eur J Med Chem 68:312–320.

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Wang L, Zhang B, Ji J, Li B, Yan J, Zhang W, Wu Y, Wang X (2009) Synthesis and evaluation of structurally constrained imidazolidin derivatives as potent dipeptidyl peptidase IV inhibitors. Eur J Med Chem 8:3318.

    CAS  Article  Google Scholar 

  29. 29.

    Jun MA, Park WS, Kang SK, Kim KY, Kim KR, Rhee SD, Bae MA, Kang NS, Sohn SK, Kim SG (2008) Synthesis and biological evaluation of pyrazoline analogues with beta-amino acyl group as dipeptidyl peptidase IV inhibitors. Eur J Med Chem 9:1889.

    CAS  Article  Google Scholar 

  30. 30.

    Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Tran Intell Syst Technol 2(3):1–27.

    Article  Google Scholar 

  31. 31.

    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Paper presented at the International Conference for Learning Representations, San Diego

  32. 32.

    Landrum G (2006) RDKit: Open-source cheminformatics

  33. 33.

    Huang K, Fu T, Xiao C, Glass L, Sun J (2020) DeepPurpose: a deep learning based drug repurposing toolkit. arXiv preprint arXiv:200408919

  34. 34.

    Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindahl E (2015) GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2:19–25.

    Article  Google Scholar 

  35. 35.

    Zoete V, Cuendet MA, Grosdidier A, Michielin O (2011) SwissParam: a fast force field generation tool for small organic molecules. J Comput Chem 32(11):2359–2368.

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Hess B, Bekker H, Berendsen HJC, Fraaije JGEM (1997) LINCS: a linear constraint solver for molecular simulations. J Comput Chem 18(12):1463–1472.;2-h

    CAS  Article  Google Scholar 

  37. 37.

    Huang M-L, Hung Y-H, Lee WM, Li R-K, Jiang B-R (2014) SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. Sci World J 2014:795624.

  38. 38.

    Zhang G-L, Rücker G, Breitmaier E, Mayer R (1995) Alkaloids from Hypecoum leptocarpum. Phytochemistry 40(6):1813–1816.

    CAS  Article  Google Scholar 

Download references


This work was supported by Guangzhou science and technology fund (Grant No. 201803010072), Science, Technology and Innovation Commission of Shenzhen Municipality (JCYL 20170818165305521) and China Medical University Hospital (DMR-110-097). We also acknowledge the start-up funding from SYSU “Hundred Talent Program.”

Author information




CY-CC designed research. J-NG, GC, XC and Z-DC worked together to complete the experiment and analyzed the data. CY-CC contributed to analytic tools. J-NG, GC, XC, Z-DC, LZ and CY-CC wrote the manuscript together.

Corresponding author

Correspondence to Calvin Yu-Chian Chen.

Ethics declarations

Conflicts of interest

The authors report no conflicts of interest in this work.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2661 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gong, JN., Zhao, L., Chen, G. et al. A novel artificial intelligence protocol to investigate potential leads for diabetes mellitus. Mol Divers 25, 1375–1393 (2021).

Download citation


  • Dipeptidyl peptidase-4 (DPP4)
  • Quantitative structure–activity relationship (QSAR)
  • Machine learning (ML)
  • Artificial intelligence (AI)
  • Molecular dynamics simulation (MD)