A Combinatorial Computational Approach for Drug Discovery Against AIDS: Machine Learning and Proteochemometrics

  • Sofia D’souza
  • Prema K. V.
  • Seetharaman Balaji


Computational methods have been widely used in drug discovery including identification of novel targets, studying drug target interactions, and in virtual screening of compounds against known targets. Machine learning techniques have been used in predictions of novel targets and drugs with greater accuracy compared to other methods. Machine learning algorithms have also been widely used in predicting the progression of disease, resistance of a drug to a virus, treatment efficacy prediction, and also in predicting the effectiveness of combinational therapy with respect to HIV-1. In this article, we have focused on some of the machine learning techniques in the context of viral disease. In brief, machine learning methods have great potential in drug discovery, drug repurposing, and in precision medicine.


Human immunodeficiency virus-1 (HIV-1) Machine learning (ML) Support vector machines (SVM) Decision tree (DT) Random Forest (RF) Artificial neural network (ANN) Proteochemometric modeling (PCM) 



The corresponding author acknowledges the grant (No. VGST/GRD-533/2016-17/241) received from Karnataka Science and Technology Promotion Society (KSTePS), India, for supporting the ‘Centre for Interactive Biomolecular 3D-literacy (C-in-3D)’ under the VGST scheme – Centres of Innovative Science, Engineering and Education (CISEE) for the year 2016-17.


  1. 1.
    Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255–60. Scholar
  2. 2.
    Chui M, Henke N, Miremadi M. Most of AI’s business uses will be in two areas. Harv Bus Rev. 2018.
  3. 3.
    Singh Y. Machine learning to improve the effectiveness of ANRS in predicting HIV drug resistance. Healthc Inform Res. 2017;23(4):271–6. Scholar
  4. 4.
    Evans D, Pottier C, Fletcher R, Hensley S, Tapley I, Milne A, Barbetti M. A comprehensive archaeological map of the world’s largest preindustrial settlement complex at Angkor, Cambodia. Proc Natl Acad Sci. 2007;104(36):14277–82. Scholar
  5. 5.
    Montgomery EB Jr, Huang H, Assadi A. Unsupervised clustering algorithm for N-dimensional data. J Neurosci Methods. 2005;144(1):19–24. Scholar
  6. 6.
    Garla V, Taylor C, Brandt C. Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. J Biomed Inform. 2013;46(5):869–75. Scholar
  7. 7.
    Vapnik VN. The nature of statistical learning theory. New York: Springer; 2000. p. 314.CrossRefGoogle Scholar
  8. 8.
    Vapnik VN. Statistical learning theory. New York: John Wiley & Sons, Inc; 1998.Google Scholar
  9. 9.
    Camps-Valls G, Bruzzone L. Kernel-based methods for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2005;43(6):1351–62. Scholar
  10. 10.
    Jacob L, Hoffmann B, Stoven V, Vert JP. Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics. 2008;9(1):363. Scholar
  11. 11.
    Singh Y, Mars M. Support vector machines to forecast changes in CD4 count of HIV-1 positive patients. Sci Res Essays. 2010;5(17):2384–90.Google Scholar
  12. 12.
    Shafer RW. Rationale and uses of a public HIV drug-resistance database. J Infect Dis. 2006;194(Supplement_1):S51–8. Scholar
  13. 13.
    Raileanu LE, Stoffel K. Theoretical comparison between the gini index and information gain criteria. Ann Math Artif Intell. 2004;41(1):77–93.CrossRefGoogle Scholar
  14. 14.
    Breiman L. Classification and regression trees. Taylor & Francis Group, LLC 1984, Boca raton, FL, pp368Google Scholar
  15. 15.
    Li Y, Rapkin B. Classification and regression tree uncovered hierarchy of psychosocial determinants underlying quality-of-life response shift in HIV/AIDS. J Clin Epidemiol. 2009;62(11):1138–47. Scholar
  16. 16.
    Muñoz-Moreno JA, Pérez-Álvarez N, Muñoz-Murillo A, Prats A, Garolera M, Jurado MÀ, Fumaz CR, Negredo E, Ferrer MJ, Clotet B. Classification models for neurocognitive impairment in HIV infection based on demographic and clinical variables. PLoS One. 2014;9(9):e107625. Scholar
  17. 17.
    Schouten J, Cinque P, Gisslen M, Reiss P, Portegies P. HIV-1 infection and cognitive impairment in the cART era: a review. AIDS. 2011;25(5):561–75. Scholar
  18. 18.
    Ho TK. The random subspace method for constructing decision forests. ITPAM. 1998;20:832–44.Google Scholar
  19. 19.
    Breiman L. Random forests. Mach Learn. 2001;45:5–32.CrossRefGoogle Scholar
  20. 20.
    Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947–58. Scholar
  21. 21.
    Shen C, Yu X, Harrison RW, Weber IT. Automated prediction of HIV drug resistance from genotype data. BMC Bioinformatics. 2016;17(8):278. Scholar
  22. 22.
    Wang D, Larder B, Revell A, Montaner J, Harrigan R, De Wolf F, Lange J, Wegner S, Ruiz L, Pérez-Elías MJ, Emery S. A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy. Artif Intell Med. 2009;47(1):63–74. Scholar
  23. 23.
    Revell AD, Wang D, Wood R, Morrow C, Tempelman H, Hamers RL, Alvarez-Uria G, Streinu-Cercel A, Ene L, Wensing AM, DeWolf F. Computational models can predict response to HIV therapy without a genotype and may reduce treatment failure in different resource-limited settings. J Antimicrob Chemother. 2013;68(6):1406–14. Scholar
  24. 24.
    Larder BA, DeGruttola V, Hammer S, Harrigan R, Wegner S, Winslow D, Zazzi M. The international HIV resistance response database initiative: a new global collaborative approach to relating viral genotype and treatment to clinical outcome. In: Antiviral therapy, vol. 7. London: International Medical Press Ltd; 2002. p. S111.Google Scholar
  25. 25.
    Tarasova O, Biziukova N, Filimonov D, Poroikov V. A computational approach for the prediction of HIV resistance based on amino acid and nucleotide descriptors. Molecules. 2018;23(11):2751. Scholar
  26. 26.
    Beerenwinkel N, Schmidt B, Walter H, Kaiser R, Lengauer T, Hoffmann D, Korn K, Selbig J. Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype. Proc Natl Acad Sci. 2002;99(12):8271–6. Scholar
  27. 27.
    Deeks SG, Hellmann NS, Grant RM, Parkin NT, Petropoulos CJ, Becker M, Symonds W, Chesney M, Volberding PA. Novel four-drug salvage treatment regimens after failure of a human immunodeficiency virus type 1 protease inhibitor-containing regimen: antiviral activity and correlation of baseline phenotypic drug susceptibility with virologic outcome. J Infect Dis. 1999;179(6):1375–81. Scholar
  28. 28.
    Harrigan PR, Hertogs K, Verbiest W, Pauwels R, Larder B, Kemp S, Bloor S, Yip B, Hogg R, Alexander C, Montaner JS. Baseline HIV drug resistance profile predicts response to ritonavir-saquinavir protease inhibitor therapy in a community setting. AIDS. 1999;13(14):1863–71.CrossRefGoogle Scholar
  29. 29.
    Walter H, Schmidt B, Rascu A, Helm M, Moschik B, Paatz C, Kurowski M, Korn K, Uberla K, Harrer T. Phenotypic HIV-1 resistance correlates with treatment outcome of nelfinavir salvage therapy. Antivir Ther. 2000;5(4):249–56.PubMedGoogle Scholar
  30. 30.
    Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res. 2009;10(Feb):207–44.Google Scholar
  31. 31.
    Drăghici S, Potter RB. Predicting HIV drug resistance with neural networks. Bioinformatics. 2003;19(1):98–107. Scholar
  32. 32.
    Hirsch MS, Günthard HF, Schapiro JM, Vézinet FB, Clotet B, Hammer SM, Johnson VA, Kuritzkes DR, Mellors JW, Pillay D, Yeni PG. Antiretroviral drug resistance testing in adult HIV-1 infection: 2008 recommendations of an International AIDS Society-USA panel. Clin Infect Dis. 2008;47(2):266–85. Scholar
  33. 33.
    Department of Health and Human Services Panel on Antiretroviral Guidelines for Adults and Adolescents. Guidelines for the use of antiretroviral agents in HIV-1-infected adults and adolescents. Washington, DC: Department of Health and Human Services; 2006.Google Scholar
  34. 34.
    Vandamme AM, Sönnerborg A, Ait-Khaled M, Albert J, Asjo B, Bacheler L, Banhegyi D, Boucher C, Brun-Vezinet F, Camacho R, Clevenbergh P. Updated European recommendations for the clinical use of HIV drug resistance testing. Antivir Ther. 2004;9(6):829–48.PubMedGoogle Scholar
  35. 35.
    Schmidt B, Walter H, Moschik B, Paatz C, Van Vaerenbergh K, Vandamme AM, Schmitt M, Harrer T, Überla K, Korn K. Simple algorithm derived from a geno−/phenotypic database to predict HIV-1 protease inhibitor resistance. AIDS. 2000;14(12):1731–8.CrossRefGoogle Scholar
  36. 36.
    Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017;46(D1):D1074–82. Scholar
  37. 37.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28(1):235–42. Scholar
  38. 38.
    Chen X, Ji ZL, Chen YZ. TTD: therapeutic target database. Nucleic Acids Res. 2002;30(1):412–5. Scholar
  39. 39.
    Hansch C. Quantitative approach to biochemical structure-activity relationships. Acc Chem Res. 1969;2(8):232–9. Scholar
  40. 40.
    Ballester PJ, Mitchell JB. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics. 2010;26(9):1169–75. Scholar
  41. 41.
    Shaikh N, Sharma M, Garg P. An improved approach for predicting drug–target interaction: proteochemometrics to molecular docking. Mol BioSyst. 2016;12(3):1006–14. Scholar
  42. 42.
    Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97(1–2):273–324. Scholar
  43. 43.
    Lapins M, Wikberg JE. Proteochemometric modeling of drug resistance over the mutational space for multiple HIV protease variants and multiple protease inhibitors. J Chem Inf Model. 2009;49(5):1202–10. Scholar
  44. 44.
    Huang Q, Jin H, Liu Q, Wu Q, Kang H, Cao Z, Zhu R. Proteochemometric modeling of the bioactivity spectra of HIV-1 protease inhibitors by introducing protein-ligand interaction fingerprint. PLoS One. 2012;7(7):e41698. Scholar
  45. 45.
    Lapins M, Eklund M, Spjuth O, Prusis P, Wikberg JE. Proteochemometric modeling of HIV protease susceptibility. BMC Bioinformatics. 2008;9(1):181. Scholar
  46. 46.
    Junaid M, Lapins M, Eklund M, Spjuth O, Wikberg JE. Proteochemometric modeling of the susceptibility of mutated variants of the HIV-1 virus to reverse transcriptase inhibitors. PLoS One. 2010;5(12):e14353. Scholar
  47. 47.
    van Westen GJ, Hendriks A, Wegner JK, IJzerman AP, van Vlijmen HW, Bender A. Significantly improved HIV inhibitor efficacy prediction employing proteochemometric models generated from antivirogram data. PLoS Comput Biol. 2013;9(2):e1002899. Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Sofia D’souza
    • 1
  • Prema K. V.
    • 1
  • Seetharaman Balaji
    • 2
  1. 1.Department of Computer Science & EngineeringManipal Institute of Technology, Manipal Academy of Higher EducationManipalIndia
  2. 2.Department of BiotechnologyManipal Institute of Technology, Manipal Academy of Higher EducationManipalIndia

Personalised recommendations