Abstract
The cost of new drug development has been increasing, and repurposing known medications for new indications serves as an important way to hasten drug discovery. One promising approach to drug repositioning is to take advantage of machine learning (ML) algorithms to learn patterns in biological data related to drugs and then link them up to the potential of treating specific diseases. Here we give an overview of the general principles and different types of ML algorithms, as well as common approaches to evaluating predictive performances, with reference to the application of ML algorithms to predict repurposing opportunities using drug expression data as features. We will highlight common issues and caveats when applying such models to repositioning. We also introduce resources of drug expression data and highlight recent studies employing such an approach to repositioning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
DiMasi JA, Grabowski HG, Hansen RW (2016) Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ 47:20–33. https://doi.org/10.1016/j.jhealeco.2016.01.012
Dudley JT, Deshpande T, Butte AJ (2011) Exploiting drug-disease relationships for computational drug repositioning. Brief Bioinform 12(4):303–311. https://doi.org/10.1093/bib/bbr013
Hodos RA, Kidd BA, Shameer K, Readhead BP, Dudley JT (2016) In silico methods for drug repurposing and pharmacology. Wiley Interdiscip Rev Syst Biol Med 8(3):186–210. https://doi.org/10.1002/wsbm.1337
Vanhaelen Q, Mamoshina P, Aliper AM, Artemov A, Lezhnina K, Ozerov I, Zhavoronkov A (2017) Design of efficient computational workflows for in silico drug repurposing. Drug Discov Today 22(2):210–222. https://doi.org/10.1016/j.drudis.2016.09.019
Aliper A, Plis S, Artemov A, Ulloa A, Mamoshina P, Zhavoronkov A (2016) Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol Pharm 13(7):2524–2530. https://doi.org/10.1021/acs.molpharmaceut.6b00248
Zhao K, So H-C (2018) Drug repositioning for schizophrenia and depression/anxiety disorders: A machine learning approach leveraging expression data. IEEE journal of biomedical and health informatics (in press)
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer Series in Statistics, New York
Hoerl AE, Kennard RW (2000) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1):80–86. https://doi.org/10.2307/1271436
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Series B Stat Methodol 73:273–282. https://doi.org/10.1111/j.1467-9868.2011.00771.x
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net (vol B 67, pg 301, 2005). J R Stat Soc Series B Stat Methodol 67:768–768. https://doi.org/10.1111/j.1467-9868.2005.00527.x
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Xie LW, He S, Wen YQ, Bo XC, Zhang ZN (2017) Discovery of novel therapeutic properties of drugs from transcriptional responses based on multi-label classification. Sci Rep 7. https://doi.org/10.1038/s41598-017-07705-8 ARTN 7136
Wang F, Zhang P, Cao N, Hu JY, Sorrentino R (2014) Exploring the associations between drug side-effects and therapeutic indications. J Biomed Inform 51:15–23. https://doi.org/10.1016/j.jbi.2014.03.014
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R (2014) A significance test for the lasso. Ann Stat 42(2):413
Breiman, L. (1984). Classification and regression trees. Belmont, CA.: Wadsworth International Group
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. https://doi.org/10.1214/aos/1013203451
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning, vol 112. Springer, New York
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/Bf00994018
Napolitano F, Zhao Y, Moreira VM, Tagliaferri R, Kere J, D'Amato M, Greco D (2013) Drug repositioning: a machine-learning approach through data integration. J Cheminform 5. https://doi.org/10.1186/1758-2946-5-30 Artn 30
Wang YC, Chen SL, Deng NY, Wang Y (2013) Drug repositioning by kernel-based integration of molecular structure, molecular activity, and phenotype data. PLoS One 8(11). https://doi.org/10.1371/journal.pone.0078518 ARTN e78518
Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, Grobler J (2013) API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv 1309:0238
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) DeepTox: toxicity prediction using deep learning. Toxicol Lett 280:S69–S69. https://doi.org/10.1016/j.toxlet.2017.07.175
Ryu JY, Kim HU, Lee SY (2018) Deep learning improves prediction of drug-drug and drug-food interactions. Proc Natl Acad Sci U S A 115(18):E4304–E4311. https://doi.org/10.1073/pnas.1803294115
Preuer K, Lewis RPI, Hochreiter S, Bender A, Bulusu KC, Klambauer G (2018) DeepSynergy: predicting anti-cancer drug synergy with deep learning. Bioinformatics 34(9):1538–1546. https://doi.org/10.1093/bioinformatics/btx806
Baskin II, Winkler D, Tetko IV (2016) A renaissance of neural networks in drug discovery. Expert Opin Drug Discov 11(8):785–795. https://doi.org/10.1080/17460441.2016.1201262
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of deep learning in drug discovery. Drug Discov Today. https://doi.org/10.1016/j.drudis.2018.01.039
Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Greene CS (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15(141). https://doi.org/10.1098/rsif.2017.0387
Varma S, Simon R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7:91. https://doi.org/10.1186/1471-2105-7-91
Davis J, Mark G (2006) The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 233–240
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Golub TR (2006) The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313(5795):1929–1935. https://doi.org/10.1126/science.1132939
Smyth GK (2005) Limma: linear models for microarray data Bioinformatics and computational biology solutions using R and Bioconductor. Springer, New York, pp 397–420
Wei WQ, Cronin RM, Xu H, Lasko TA, Bastarache L, Denny JC (2013) Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc 20(5):954–961. https://doi.org/10.1136/amiajnl-2012-001431
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Advances in neural information processing systems, pp 431–439
Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu XD, Golub TR (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171(6):1437. https://doi.org/10.1016/j.cell.2017.10.049
So HC, Chau CKL, Chiu WT, Ho KS, Lo CP, Yim SHY, Sham PC (2017) Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nat Neurosci 20(10):1342-+. https://doi.org/10.1038/nn.4618
Acknowledgment
This work is partially supported by the Lo Kwee-Seong Biomedical Research Fund and a Direct Grant from the Chinese University of Hong Kong to HCS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zhao, K., So, HC. (2019). Using Drug Expression Profiles and Machine Learning Approach for Drug Repurposing. In: Vanhaelen, Q. (eds) Computational Methods for Drug Repurposing. Methods in Molecular Biology, vol 1903. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8955-3_13
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8955-3_13
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8954-6
Online ISBN: 978-1-4939-8955-3
eBook Packages: Springer Protocols