Active learning strategies with COMBINE analysis: new tricks for an old dog

  • Lucia Fusani
  • Alvaro Cortes CabreraEmail author


The COMBINE method was designed to study congeneric series of compounds including structural information of ligand–protein complexes. Although very successful, the method has not received the same level of attention than other alternatives to study Quantitative Structure Active Relationships (QSAR) mainly because lack of ways to measure the uncertainty of the predictions and the need for large datasets. Active learning, a semi-supervised learning approach that makes use of uncertainty to enhance models’ performance while reducing the size of the training sets, has been used in this work to address both problems. We propose two estimators of uncertainty: the pool of regressors and the distance to the training set. The performance of the methods has been evaluated by testing the resulting active learning workflows in 3 diverse datasets: HIV-1 protease inhibitors, Taxol-derivatives and BRD4 inhibitors. The proposed strategies were successful in 80% of the cases for the taxol-derivatives and BRD4 inhibitors, while outperformed random selection in the case of the HIV-1 protease inhibitors time-split. Our results suggest that AL-COMBINE might be an effective way of producing consistently superior QSAR models with a limited number of samples.


COMBINE QSAR HIV Taxanes Protease BRD4 Active learning Regression 



Active learning


Partial least squares


Support vector machine regression


Quantitative structure–activity relationships


COMparative binding energy analysis


Classic molecular mechanism implicit solvent model surface access


Human immunodeficiency virus


Bromodomain-containing protein 4 N-terminal bromodomain



In memoriam Dr. Angel Ramirez Ortiz (1966–2008). We thank Prof. Dr. Federico Gago for providing the historical HIV-PR and taxanes data sets.

Supplementary material

10822_2018_181_MOESM1_ESM.docx (9.6 mb)
Supplementary material 1 (DOCX 9804 KB)


  1. 1.
    Ortiz AR, Pisabarro MT, Gago F, Wade RC (1995) J Med Chem 38(14):2681CrossRefGoogle Scholar
  2. 2.
    Wang T, Wade RC (2002) J Med Chem 45(22):4828CrossRefGoogle Scholar
  3. 3.
    Cuevas C, Pastor M, Pérez C, Gago F (2001) Comb Chem High Throughput Screen 4(8):627CrossRefGoogle Scholar
  4. 4.
    Wang T, Wade RC (2001) J Med Chem 44(6):961CrossRefGoogle Scholar
  5. 5.
    Pérez C, Pastor M, Ortiz AR, Gago F (1998) J Med Chem 41(6):836CrossRefGoogle Scholar
  6. 6.
    Peón A, Coderch C, Gago F, González-Bello C (2013) ChemMedChem 8(5):740CrossRefGoogle Scholar
  7. 7.
    Teruya K, Hattori Y, Shimamoto Y, Kobayashi K, Sanjoh A, Nakagawa A, Yamashita E, Akaji K (2016) Pept Sci 106(4):391CrossRefGoogle Scholar
  8. 8.
    Le X, Gu Q, Xu J (2015) RSC Adv 5(51):40536CrossRefGoogle Scholar
  9. 9.
    Arakawa M, Hasegawa K, Funatsu K (2008) Chemometr Intell Lab Syst 92(2):145CrossRefGoogle Scholar
  10. 10.
    Gil-Redondo R, Klett J, Gago F, Morreale A (2010) Proteins 78(1):162CrossRefGoogle Scholar
  11. 11.
    Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) J Chem Inf Comput Sci 43(6):1947CrossRefGoogle Scholar
  12. 12.
    Sheridan RP (2013) J Chem Inf Model 53(11):2837CrossRefGoogle Scholar
  13. 13.
    Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) J Chem Inf Model 55(2):263CrossRefGoogle Scholar
  14. 14.
    Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) J Chem Inf Model 56(12):2353CrossRefGoogle Scholar
  15. 15.
    Reker D, Schneider G (2015) Drug Discov Today 20(4):458CrossRefGoogle Scholar
  16. 16.
    Douak F, Melgani F, Alajlan N, Pasolli E, Bazi Y, Benoudjit N (2012) J Chemom 26(7):374CrossRefGoogle Scholar
  17. 17.
    Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C (2003) J Chem Inf Comput Sci 43(2):667CrossRefGoogle Scholar
  18. 18.
    Wang S-R, Yang C-G, Sánchez-Murcia PA, Snyder JP, Yan N, Sáez-Calvo G, Diaz JF, Gago F, Fang W-S (2015) Org Lett 17(24):6098CrossRefGoogle Scholar
  19. 19.
    Ma Y-T, Yang Y, Cai P, Sun D-Y, Sánchez-Murcia PA, Zhang X-Y, Jia W-Q, Lei L, Guo M, Gago F (2018) J Nat Prod 81(3):524CrossRefGoogle Scholar
  20. 20.
    Matesanz R, Barasoain I, Yang C-G, Wang L, Li X, De Ines C, Coderch C, Gago F, Barbero JJ, Andreu JM (2008) Chem Biol 15(6):573CrossRefGoogle Scholar
  21. 21.
    Holloway MK, Wai JM, Halgren TA, Fitzgerald PM, Vacca JP, Dorsey BD, Levin RB, Thompson WJ, Chen LJ (1995) J Med Chem 38(2):305CrossRefGoogle Scholar
  22. 22.
    Engelhardt H, Martin L, Smethurst C (2015) Pyridinones. 2015 Sep. 3Google Scholar
  23. 23.
    Klett J, Núñez-Salgado A, Dos Santos HG, Cortés-Cabrera Al, Perona A, Gil-Redondo Rn, Abia D, Gago F, Morreale A (2012) J Chem Theory Comput 8(9):3395CrossRefGoogle Scholar
  24. 24.
    Hassan SA, Guarnieri F, Mehler EL (2000) J Phys Chem B 104(27):6490CrossRefGoogle Scholar
  25. 25.
    Hassan SA, Guarnieri F, Mehler EL (2000) J Phys Chem B 104(27):6478CrossRefGoogle Scholar
  26. 26.
    Alvarez Y, Esteban-Torres M, Cortés-Cabrera Á, Gago F, Acebrón I, Benavente R, Mardo K, de las Rivas B, Muñoz R, Mancheño JM (2014) PLoS ONE 9(3):e92257CrossRefGoogle Scholar
  27. 27.
    Sánchez-Murcia PA, Cortés-Cabrera Á, Gago F (2017) J Comput-Aided Mol Des:1Google Scholar
  28. 28.
    Ortiz AR, Pastor M, Palomer A, Cruciani G, Gago F, Wade RC (1997) J Med Chem 40(7):1136CrossRefGoogle Scholar
  29. 29.
    da Silva AWS, Vranken WF (2012) BMC Res Notes 5(1):367CrossRefGoogle Scholar
  30. 30.
    Duke R, Giese T, Gohlke H, Goetz A, Homeyer N, Izadi S, Janowski P, Kaus J, Kovalenko A, Lee T (2016) AmberTools 16. University of California, San FranciscoGoogle Scholar
  31. 31.
    Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA (2004) J Comput Chem 25(9):1157CrossRefGoogle Scholar
  32. 32.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) J Mach Learn Res 12(Oct):2825Google Scholar
  33. 33.
    Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) ACM Sigmod Rec 29(2):93CrossRefGoogle Scholar
  34. 34.
    Coderch C, Klett J, Morreale A, Díaz JF, Gago F (2012) ChemMedChem 7(5):836CrossRefGoogle Scholar
  35. 35.
    Canales A, Nieto L, Rodríguez-Salarichs J, Sánchez-Murcia PA, Coderch C, Cortés-Cabrera A, Paterson I, Carlomagno T, Gago F, Andreu JM (2014) ACS Chem Biol 9(4):1033CrossRefGoogle Scholar
  36. 36.
    Fusani L, Wall I, Palmer D, Cortes A (2018) Bioinformatics 34(11):1947CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Molecular Design UK. GSK Medicines Research CentreStevenageUK
  2. 2.Data Science and Computational ChemistryTres CantosSpain

Personalised recommendations