Improving machine learning in early drug discovery

Bendtsen, Claus; Degasperi, Andrea; Ahlberg, Ernst; Carlsson, Lars

doi:10.1007/s10472-017-9541-2

Improving machine learning in early drug discovery

Published: 18 March 2017

Volume 81, pages 155–166, (2017)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Claus Bendtsen¹,
Andrea Degasperi²,
Ernst Ahlberg³ &
…
Lars Carlsson ORCID: orcid.org/0000-0001-9491-4134⁴

439 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

The high cost for new medicines is hindering their development and machine learning is therefore being used to avoid carrying out physical experiments. Here, we present a comparison between three different machine learning approaches in a classification setting where learning and prediction follow a teaching schedule to mimic the drug discovery process. The approaches are standard SVM classification, SVM based multi-kernel classification and SVM classification based on learning using privileged information. Our two main conclusions are derived using experimental in-vitro data and compound structure descriptors. The in-vitro data is assumed to i) be completely absent in the standard SVM setting, ii) be available at all times when applying multi-kernel learning, or iii) be available as privileged information during training only. The structure descriptors are always available. One conclusion is that multi-kernel learning has higher odds than standard SVM in producing higher accuracy. The second is that learning using privileged information does not have higher odds than the standard SVM, although it may improve accuracy when the training sets are small.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Drug-Target Network-Based Supervised Machine Learning Repurposing Method Allowing the Use of Multiple Heterogeneous Information Sources

Drug Activity Characterization Using One-Class Support Vector Machines with Counterexamples

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Article Open access 25 May 2017

References

Agresti, A.: Categorical Data Analysis. John Wiley & Sons, Inc., Hooken (2001)
MATH Google Scholar
Arrowsmith, J., Miller, P.: Trial Watch: Phase II and Phase III attrition rates 2011–2012. Nat. Publ. Group 12(8), 569–569 (2013)
Google Scholar
Ballard, P., Brassil, P., Bui, K.H., Dolgos, H., Petersson, C., Tunek, A., Webborn, P.J.H.: The right compound in the right assay at the right time: an integrated discovery DMPK strategy. Drug Metab. Rev. 44(3), 224–252 (2012)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Article Google Scholar
Cook, D., Brown, D., Alexander, R., March, R., Morgan, P., Satterthwaite, G., Pangalos, M.N.: Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nat. Publ. Group 13(6), 419–431 (2014)
Google Scholar
Costello, J.C., Heiser, L.M., Georgii, E., Nen, M.G.O., Menden, M.P., Wang, N.J., Bansal, M., Ammadud din, M., Hintsanen, P., Khan, S.A., Mpindi, J.P., Kallioniemi, O., Honkela, A., Aittokallio, T., Wennerberg, K., Collins, J.J., Gallahan, D., Singer, D., Saez-Rodriguez, J., Kaski, S., Gray, J.W., Stolovitzky, G.: A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1202–1212 (2014)
DiMasi, J.A.: Cost of Developing a New Drug. Tech. Rep. R&D Cost Study Briefing, Tufts Center for the Study of Drug Development, Boston, MA (2014)
Eckert, H., Bajorath, J.: Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov. Today 12(5-6), 225–233 (2007)
Article Google Scholar
Eklund, M., Norinder, U., Boyer, S., Carlsson, L.: The application of conformal prediction to the drug discovery process. Annals of Mathematics and Artificial Intelligence pp. 1–16. doi:10.1007/s10472-013-9378-2 (2013)
Eklund, M., Norinder, U., Boyer, S., Carlsson, L.: The application of conformal prediction to the drug discovery process. Ann. Math. Artif. Intell. 74(1), 117–132 (2015)
Article MathSciNet MATH Google Scholar
Faulon, J.L., Churchwell, C.J., Visco, D.P.: The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences. J. Chem. Inf. Comput. Sci. 43(3), 721–734 (2003)
Article Google Scholar
Faulon, J.L., Visco, D.P., Pophale, R.S.: The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003)
Article Google Scholar
Gönen, M.: Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12(Jul), 2211–2268 (2011)
MathSciNet MATH Google Scholar
Helal, K.Y., Maciejewski, M., Gregori-Puigjané, E., Glick, M., Wassermann, A.M.: Public Domain HTS Fingerprints: Design and Evaluation of Compound Bioactivity Profiles from PubChem’s Bioassay Repository. Journal of Chemical Information and Modeling p. acs.jcim.5b00498. doi:10.1021/acs.jcim.5b00498 (2016)
Herper, M.: The Truly Staggering Cost Of Inventing New Drugs. Forbes (2012)
Lapin, M., Hein, M., Schiele, B.: Learning using privileged information: SVM+ and weighted SVM. Neural Netw. 53, 95–108 (2014)
Article MATH Google Scholar
Li, W., Dai, D., Tan, M., Xu, D., Van Gool, L.: Fast algorithms for linear and kernel SVM+ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2258–2266 (2016)
Liang, L., Cherkassky, V.: Connection between svm+ and multi-task learning 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 2048–2054. IEEE (2008)
Liu, R., Schyman, P., Wallqvist, A.: Critically assessing the predictive power of qsar models for human liver microsomal stability. J. Chem. Inf. Model. 55(8), 1566–1575 (2015)
Article Google Scholar
Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: DeepTox: Toxicity Prediction using Deep Learning. Front. Environ. Sci. 3, 24–15 (2016)
Article Google Scholar
Pasupa, K., Hussain, Z., Shawe-Taylor, J., Willett, P.: Drug screening with elastic-net multiple kernel learning 13th IEEE International Conference on BioInformatics and BioEngineering, pp 1–5 (2013). doi:10.1109/BIBE.2013.6701529
Paul, S.M., Mytelka, D.S., Dunwiddie, C.T., Persinger, C.C., Munos, B.H., Lindborg, S.R., Schacht, A.L.: How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nature Reviews Drug Discovery 1–12 (2010)
Pechyony, D., Izmailov, R., Vashist, A., Vapnik, V.: Smo-style algorithms for learning using privileged information DMIN, pp. 235–241 (2010)
Pechyony, D., Vapnik, V.: Fast optimization algorithms for solving svm+. Stat. Learning and Data Science 1 (2011)
Peck, R.W., Lendrem, D.W., Grant, I., Lendrem, B.C., Isaacs, J.D.: Why is it hard to terminate failing projects in pharmaceutical R&D?. Nature Publishing Group, 1–2 (2015)
Petrone, P.M., Simms, B., Nigsch, F., Lounkine, E., Kutchukian, P., Cornett, A., Deng, Z., Davies, J.W., Jenkins, J.L., Glick, M.: Rethinking molecular similarity: Comparing compounds on the basis of biological activity. ACS Chem. Biol. 7(8), 1399–1409 (2012). doi:10.1021/cb3001028
Article Google Scholar
Ribeiro, B., Silva, C., Chen, N., Vieira, A., das Neves, J.C.: Enhanced default risk models with SVM+. Expert Syst. Appl. 39(11), 10,140–10,152 (2012)
Article Google Scholar
Riniker, S., Wang, Y., Jenkins, J.L., Landrum, G.A.: Using information from historical high-throughput screens to predict active compounds. doi:10.1021/ci500190p (2014)
Scannell, J.W., Bosley, J.: When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis. PLoS ONE 11(2), e0147,215–21 (2016)
Article Google Scholar
Serra-Toro, C., Traver, V.J., Pla, F.: Exploring some practical issues of SVM+: Is really privileged information that helps Pattern Recogn. Lett. 42, 40–46 (2014)
Article Google Scholar
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The chemistry development kit (cdk) an open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43(2), 493–500 (2003). doi:10.1021/ci025584y PMID: 12653513
Article Google Scholar
Vapnik, V.: Learning Using Privileged Information: Similarity Control and Knowledge Transfer (2015)
Vapnik, V., Vashist, A.: A new learning paradigm: Learning using privileged information. Neural Netw. 22(5), 544–557 (2009)
Article MATH Google Scholar
Vovk, V., Shafer, G., Gammerman, A.: Algorithmic learning in a random world. Springer, New York (2005)
MATH Google Scholar
Wang, Z., Ji, Q.: Classifier learning with hidden information Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4969–4977 (2015)
Waring, M.J., Arrowsmith, J., Leach, A.R., Leeson, P.D., Mandrell, S., Owen, R.M., Pairaudeau, G., Pennie, W.D., Pickett, S.D., Wang, J., Wallace, O., Weir, A.: An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat. Publ. Group 14(7), 475–486 (2015)
Google Scholar
Woolf, B.: On estimating the relation between blood group and disease. Ann. Human Genet. 19, 251–253 (1955)
Article Google Scholar
Xu, X., Zhou, J.T., Tsang, I., Qin, Z., Goh, R.S.M., Liu, Y.: Simple and efficient learning using privileged information BeyondLabeler: Human is More Than a Labeler. Workshop of the 25th International Joint Conference on Artificial Intelligence (IJCAI-16), New York City, USA. arXiv:1604.01518(2016)
Yau, E., Petersson, C., Dolgos, H., Peters, S.A.: A comparative evaluation of models to predict human intestinal metabolism from nonclinical data. Biopharmaceutics & Drug Disposition (2017)

Download references

Acknowledgments

AD is supported by the Science Foundation Ireland Industry Fellowship No. 15/IFA/2925.

Author information

Authors and Affiliations

AstraZeneca, Innovative Medicines & Early Development, Quantitative Biology, Discovery Sciences, Cambridge Science Park, Cambridge, CB4 0WG, UK
Claus Bendtsen
University College Dublin Systems Biology Ireland, Belfiled, Dublin, Republic of Ireland
Andrea Degasperi
AstraZeneca, Innovative Medicines & Early Development, Predictive Compound ADME & Safety, Drug Safety & Metabolism, Pepparedsleden 1, 431 83, Mölndal, Sweden
Ernst Ahlberg
AstraZeneca, Innovative Medicines & Early Development, Quantitative Biology, Discovery Sciences, Pepparedsleden 1, 431 83, Mölndal, Sweden
Lars Carlsson

Authors

Claus Bendtsen
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Degasperi
View author publications
You can also search for this author in PubMed Google Scholar
Ernst Ahlberg
View author publications
You can also search for this author in PubMed Google Scholar
Lars Carlsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claus Bendtsen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bendtsen, C., Degasperi, A., Ahlberg, E. et al. Improving machine learning in early drug discovery. Ann Math Artif Intell 81, 155–166 (2017). https://doi.org/10.1007/s10472-017-9541-2

Download citation

Published: 18 March 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10472-017-9541-2

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving machine learning in early drug discovery

Abstract

Access this article

Similar content being viewed by others

A Drug-Target Network-Based Supervised Machine Learning Repurposing Method Allowing the Use of Multiple Heterogeneous Information Sources

Drug Activity Characterization Using One-Class Support Vector Machines with Counterexamples

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Improving machine learning in early drug discovery

Abstract

Access this article

Similar content being viewed by others

A Drug-Target Network-Based Supervised Machine Learning Repurposing Method Allowing the Use of Multiple Heterogeneous Information Sources

Drug Activity Characterization Using One-Class Support Vector Machines with Counterexamples

Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation