Predicting Mouse Liver Microsomal Stability with “Pruned” Machine Learning Models and Public Data
- First Online:
- Cite this article as:
- Perryman, A.L., Stratton, T.P., Ekins, S. et al. Pharm Res (2016) 33: 433. doi:10.1007/s11095-015-1800-5
- 334 Downloads
Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability.
Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism).
“Pruning” out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h.
Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
Key WordsBayesian modelmachine learningmetabolic stabilitymouse liver microsomal stabilitytranslational research
Absorption metabolism, distribution, excretion and toxicity
Collaborative Drug Discovery
Molecular function class fingerprints of maximum diameter 6
Human liver microsomal stability
High Throughput Screens
positive predictive value
Quantitative Structure-Activity Relationships
Structure Activity Relationship
Support Vector Machine