Active learning strategies with COMBINE analysis: new tricks for an old dog
- 94 Downloads
The COMBINE method was designed to study congeneric series of compounds including structural information of ligand–protein complexes. Although very successful, the method has not received the same level of attention than other alternatives to study Quantitative Structure Active Relationships (QSAR) mainly because lack of ways to measure the uncertainty of the predictions and the need for large datasets. Active learning, a semi-supervised learning approach that makes use of uncertainty to enhance models’ performance while reducing the size of the training sets, has been used in this work to address both problems. We propose two estimators of uncertainty: the pool of regressors and the distance to the training set. The performance of the methods has been evaluated by testing the resulting active learning workflows in 3 diverse datasets: HIV-1 protease inhibitors, Taxol-derivatives and BRD4 inhibitors. The proposed strategies were successful in 80% of the cases for the taxol-derivatives and BRD4 inhibitors, while outperformed random selection in the case of the HIV-1 protease inhibitors time-split. Our results suggest that AL-COMBINE might be an effective way of producing consistently superior QSAR models with a limited number of samples.
KeywordsCOMBINE QSAR HIV Taxanes Protease BRD4 Active learning Regression
Partial least squares
Support vector machine regression
Quantitative structure–activity relationships
COMparative binding energy analysis
Classic molecular mechanism implicit solvent model surface access
Human immunodeficiency virus
Bromodomain-containing protein 4 N-terminal bromodomain
In memoriam Dr. Angel Ramirez Ortiz (1966–2008). We thank Prof. Dr. Federico Gago for providing the historical HIV-PR and taxanes data sets.
- 22.Engelhardt H, Martin L, Smethurst C (2015) Pyridinones. 2015 Sep. 3Google Scholar
- 27.Sánchez-Murcia PA, Cortés-Cabrera Á, Gago F (2017) J Comput-Aided Mol Des:1Google Scholar
- 30.Duke R, Giese T, Gohlke H, Goetz A, Homeyer N, Izadi S, Janowski P, Kaus J, Kovalenko A, Lee T (2016) AmberTools 16. University of California, San FranciscoGoogle Scholar
- 32.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) J Mach Learn Res 12(Oct):2825Google Scholar