Topic Models with Relational Features for Drug Design

Faruquie, Tanveer A.; Srinivasan, Ashwin; King, Ross D.

doi:10.1007/978-3-642-38812-5_4

Tanveer A. Faruquie²¹,
Ashwin Srinivasan²² &
Ross D. King²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7842))

Included in the following conference series:

International Conference on Inductive Logic Programming

625 Accesses
5 Citations

Abstract

To date, ILP models in drug design have largely focussed on models in first-order logic that relate two- or three-dimensional molecular structure of a potential drug (a ligand) to its activity (for example, inhibition of some protein). In modelling terms: (a) the models have largely been logic-based (although there have been some attempts at probabilistic models); (b) the models have been mostly of a discriminatory nature (they have been mainly used for classification tasks); and (c) data for concepts to be learned are usually provided explicitly: “hidden” or latent concept learning is rare. Each of these aspects imposes certain limitations on the use of such models for drug design. Here, we propose the use of “topic models”—correctly, hierarchical Bayesian models—as a general and powerful modelling technique for drug design. Specifically, we use the feature-construction cabilities of a general-purpose ILP system to incorporate complex relational information into topic models for drug-like molecules. Our main interest in this paper is to describe computational tools to assist the discovery of drugs for malaria. To this end, we describe the construction of topic models using the GlaxoSmithKline Tres Cantos Antimalarial TCAMS dataset. This consists of about 13,000 inhibitors of the 3D7 strain of P. falciparum in human erythrocytes, obtained by screening of approximately 2 million compounds. We investigate the discrimination of molecules into groups (for example, “more active” and “less active”). For this task, we present evidence that suggests that when it is important to maximise the detection of molecules with high activity (“hits”), topic-based classifiers may be better than those that operate directly on the feature-space representation of the molecules. Besides the applicability for modelling anti-malarials, an obvious utility of topic-modelling as a technique of reducing the dimensionality of ILP-constructed feature spaces is also apparent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, N.M., Hand, D.J.: Comparing classifiers when misallocation costs are uncertain. Pattern Recognition 32, 1139–1147 (1999)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Boyle, N.M., Banck, M., James, C.A., Morley, C., Vandermeersch, T., Hutchison, D.R.: Open Babel: an open chemical toolbox. Chemoinformatics 3, 33 (2011), http://www.openbabel.org
Article Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)
Google Scholar
Gamo, F., Sanz, L.M., Vidal, J., de Cozar, C., Alvarez, E., Lavandera, J., Vanderwall, D.E., Green, D.V.S., Kumar, V., Hasan, S., Brown, J.R., Peishoff, C.E., Cardon, L.R., Garcia-Bustos, J.F.: Thousands of chemical starting points for antimalarial lead identification. Nature 465(7296), 305–310 (2010)
Article Google Scholar
Grun, B., Hornik, K.: topicmodels: An R Package for fitting Topic Models. Journal of Statistical Software 40(13), 1–30 (2011)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
King, R.D., Muggleton, S.H., Srinivasan, A., Sternberg, M.J.E.: Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. of the National Academy of Sciences 93, 438–442 (1996)
Article Google Scholar
King, R.D., Srinivasan, A.: Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives 104(5), 1031–1040 (1996)
Google Scholar
O’Neill, P.M., Barton, V.E., Ward, S.A.: The Molecular Mechanism of Action of Artemisinin—The Debate Continues. Molecules 15, 1705–1721 (2010)
Article Google Scholar
Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Relational Data Mining, pp. 262–286. Springer, New York (2001)
Chapter Google Scholar
Srinivasan, A.: The Aleph Manual (1999), http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/
Srinivasan, A., King, R.D.: Feature construction with Inductive Logic Programming: a study of quantitative predictions of biological activity aided by structural attributes. In: Muggleton, S. (ed.) ILP 1996. LNCS (LNAI), vol. 1314, pp. 89–104. Springer, Heidelberg (1997)
Chapter Google Scholar
Taranto, C., Di Mauro, N., Esposito, F.: rsLDA: A Bayesian heirarchical model for relational learning. In: ICDKE, pp. 68–74 (2011)
Google Scholar
WHO. Global plan for artemisinin resistance containment, GPARC (2011), http://www.who.int/malaria/publications/atoz/9789241500838/en/index.html
WHO. World Malaria Report 2011 (2011), http://www.who.int/malaria/world_malaria_report_2011/en/

Download references

Author information

Authors and Affiliations

IBM Research—India, Block 4-C, Vasant Institutional Area, New Delhi, India
Tanveer A. Faruquie
Indraprastha Institute of Information Technology, Delhi (IIIT-D), New Delhi, India
Ashwin Srinivasan
School of Computer Science, University of Manchester, United Kingdom
Ross D. King

Authors

Tanveer A. Faruquie
View author publications
You can also search for this author in PubMed Google Scholar
Ashwin Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Ross D. King
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Ferrara, Via Saragat 1, 44122, Ferrara, Italy
Fabrizio Riguzzi
Department of Computer Science and Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, Karlovo namesti 13, 12135, Prague 2, Republic Czech
Filip Železný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faruquie, T.A., Srinivasan, A., King, R.D. (2013). Topic Models with Relational Features for Drug Design. In: Riguzzi, F., Železný, F. (eds) Inductive Logic Programming. ILP 2012. Lecture Notes in Computer Science(), vol 7842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38812-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-38812-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38811-8
Online ISBN: 978-3-642-38812-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics