Inductive Queries for a Drug Designing Robot Scientist

King, Ross D.; Schierz, Amanda; Clare, Amanda; Rowland, Jem; Sparkes, Andrew; Nijssen, Siegfried; Ramon, Jan

doi:10.1007/978-1-4419-7738-0_18

Ross D. King⁴,
Amanda Schierz⁵,
Amanda Clare⁴,
Jem Rowland⁴,
Andrew Sparkes⁴,
Siegfried Nijssen⁶ &
…
Jan Ramon⁶

669 Accesses
1 Citations

Abstract

It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Borgelt and M.R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In ICDM, pages 51–58. IEEE Computer Society, 2002.
Google Scholar
H. Blockeel, L. De Raedt. Top-Down Induction of First-Order Logical Decision Trees. Artif. Intell. 101(1–2): 285–297 (1998).
Article MATH Google Scholar
H. Blockeel, S. Dzeroski, B. Kompare, S. Kramer, B. Pfahringer, and W. Van Laer. Experiments in predicting biodegradability. In Appl. Art. Int. 18, pages 157–181, 2004.
Google Scholar
B. Bringmann, A. Zimmermann, L. De Raedt, and S. Nijssen. Don’t be afraid of simpler patterns. In J. Fürnkranz, T. Scheffer, and M. Spiliopoulou, editors, PKDD, volume 4213 of Lecture Notes in Computer Science, pages 55–66. Springer, 2006.
Google Scholar
E.F. Codd. Recent Investigations into Relational Data Base Systems. IBM Research Report RJ1385 (April 23rd, 1974). Republished in Proc. 1974 Congress (Stockholm, Sweden, 1974). New York, N.Y.: North–Holland, 1974.
Google Scholar
Dennis D. Cox and Susan John. SDO: a statistical method for global optimization. In Multidisciplinary design optimization (Hampton, VA, 1995), pages 315–329. SIAM, 1997.
Google Scholar
R.D. III Cramer, D.E. Patterson, and Bunce J.D. Comparative Field Analysis (CoMFA). The effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110: 5959–5967, 1988.
Google Scholar
L. Dehaspe, H. Toivonen, and R.D. King. Finding frequent substructures in chemical compounds. In: The Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, Ca. 30–36, 1998.
Google Scholar
L. Dehaspe, L. De Raedt. Mining Association Rules in Multiple Relations. In: ILP 1997: 125–132.
Google Scholar
L. De Raedt. Statistical and Relational Learning. Springer, 2008.
Google Scholar
L. De Raedt, J. Ramon. Deriving distance metrics from generality relations. Pattern Recognition Letters 30(3): 187–191 (2009).
Article Google Scholar
R.O.Duda, P.E. Hart, and D.G. Stork. Pattern Classification. Wiley, 2001.
Google Scholar
D. Enot and R.D. King. Application of inductive logic programming to structure-based drug design. Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2003.
Google Scholar
D. Eppstein. Subgraph isomorphism in planar graphs and related problems. In Symposium on Discrete Algorithms, pages 632–640, 1995.
Google Scholar
P. Frasconi, A. Passerini. Learning with Kernels and Logical Representations. Probabilistic Inductive Logic Programming, 2008: 56–91.
Google Scholar
T. Gärtner. A survey of kernels for structured data. SIGKDD Explorations, 5(18.1):49–58, 2003.
Article Google Scholar
T. Gärtner, Peter A. Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. In B. Schölkopf and M.K. Warmuth, editors, COLT, volume 2777 of Lecture Notes in Computer Science, pages 129–143. Springer, 2003.
Google Scholar
J. Gasteiger and T. Engel. Chemoinformatics: A Textbook. Wiley-VCH, 2003.
Google Scholar
C. Hansch, P.P. Malony, T. Fujiya, and R.M. Muir. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194, 178–180, 1965.
Article Google Scholar
H. Hofer, C. Borgelt, and M.R. Berthold. Large scale mining of molecular fragments with wildcards. In M.R. Berthold, H-J. Lenz, E. Bradley, R. Kruse, and C. Borgelt, editors, IDA, volume 2810 of Lecture Notes in Computer Science, pages 376–385. Springer, 2003.
Google Scholar
C. Helma, T. Cramer, S. Kramer, and L. De Raedt. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. In Journal of Chemical Information and Computer Systems 44, pages 1402–1411, 2004.
Google Scholar
T. Horváth and J. Ramon. Efficient frequent connected subgraph mining in graphs of bounded treewidth. In W. Daelemans, B. Goethals, and K. Morik, editors, ECML/PKDD (18.1), volume 5211 of Lecture Notes in Computer Science, pages 520–535. Springer, 2008.
Google Scholar
T. Horváth, J. Ramon, and S. Wrobel. Frequent subgraph mining in outerplanar graphs. In KDD, pages 197–206. ACM, 2006.
Google Scholar
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pages 549–552. IEEE Press, 2003.
Google Scholar
Jun Huan, Wei Wang, Jan Prins, and Jiong Yang. Spin: mining maximal frequent subgraphs from graph databases. In Won Kim, Ron Kohavi, Johannes Gehrke, and William DuMouchel, editors, KDD, pages 581–586. ACM, 2004.
Google Scholar
Akihiro Inokuchi. Mining generalized substructures from a set of labeled graphs. In ICDM, pages 415–418. IEEE Computer Society, 2004.
Google Scholar
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), volume 1910 of Lecture Notes in Artificial Intelligence, pages 13–23. Springer-Verlag, 2000.
Google Scholar
D.R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345–383, 2001.
Article MATH Google Scholar
D.R. Jones and M. Schonlau. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, December 1998.
Article MATH MathSciNet Google Scholar
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proceedings of the First IEEE International Conference on Data Mining (ICDM), pages 313–320. IEEE Press, 2001.
Google Scholar
J. Kazius, S. Nijssen, J.N. Kok, T. Bäck, and A. IJzerman. Substructure mining using elaborate chemical representation. In Journal of Chemical Information and Modeling 46, 2006.
Google Scholar
R.D. King, S. Muggleton, R.A Lewis, and M.J.E Sternberg. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. Nat. Acad. Sci. U.S.A. 89, 11322–11326, 1992.
Google Scholar
R.D. King, S. Muggleton, A. Srinivasan, and M.J.E. Sternberg. Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. Nat. Acad. Sci. USA 93, 438–442, 1996.
Google Scholar
R.D. King, J. Rowland, S.G. Oliver, M. Young, W. Aubrey, E. Byrne, M. Liakata, M. Markham, P. Pir, L.N. Soldatova, A. Sparkes, K.E. Whelan, A. Clare. The Automation of Science. Science. Vol. 324, no. 5923, pp. 85 – 89.
Google Scholar
S. Kramer and L. De Raedt. Feature construction with version spaces for biochemical applications. In ICML, pages 258–265. Morgan Kaufmann, 2001.
Google Scholar
S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in hiv data. In KDD, pages 136–143, 2001.
Google Scholar
M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. In Proc. 15th International Conf. on Machine Learning, pages 260–268. Morgan Kaufmann, 1998.
Google Scholar
H.J. Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, pages 97–106, March 1964.
Google Scholar
A.R. Leach, and V.J. Gillet. An Introduction to Chemoinformatics, Kluwer, 2003.
Google Scholar
A. Lingas. Subgraph isomorphism for biconnected outerplanar graphs in cubic time. Theoretical Computer Science 63, 295–302, 1989.
Article MATH MathSciNet Google Scholar
C.A. Lipinski, F. Lombardo, B.W. Dominy, and P. J. Feeney. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Delivery Rev., 23(1–3), pp. 3–25, 1997.
Article Google Scholar
D. Lizotte, T. Wang, M. Bowling, and D. Schuurmans. Automatic gait optimization with gaussian process regression. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 944–949, 2007.
Google Scholar
Y.C. Martin. Quantitative Drug Design: A Critical Introduction, Marcel Dekker, 1978.
Google Scholar
J. Matousek and R. Thomas. On the complexity of finding iso- and other morphisms for partial k–trees. Discrete mathemathics, 108(1–3), 343–364, 1992.
Article MATH MathSciNet Google Scholar
P.B. Medewar. Advice to a Young Scientist. BasicBooks. 1979.
Google Scholar
S. Nijssen. Mining interpretable subgraphs. In Proceedings of the International Workshop on Mining and Learning with Graphs (MLG), 2006.
Google Scholar
S. Nijssen and J.N. Kok. A quickstart in frequent structure mining can make a difference. In Proceedings of the 2004 International Conference on Knowledge Discovery and Data Mining (KDD), pages 647–652. ACM Press, 2004.
Google Scholar
J. Ramon and S. Nijssen. Polynomial-delay enumeration of monotonic graph classes. Journal of Machine Learning Research, 2009.
Google Scholar
M. J. Sasena. Flexibility and Efficiency Enhancements for Constrained Global Design Optimization with Kriging Approximations. PhD thesis, University of Michigan, 2002.
Google Scholar
A. Schierz, and R.D. King. Drugs and Drug-like compounds: Discriminating Approved Pharmaceuticals from Screening Library Compounds. In Pattern Recognition in Bioinformatics, pages 331–343, 2009.
Google Scholar
L. Schietgat, J. Ramon, M. Bruynooghe, H. Blockeel. An Efficiently Computable Graph- Based Metric for the Classification of Small Molecules. In Discovery Science 2008: 197–209.
Google Scholar
S. V. N. Vishwanathan, N.N. Schraudolph, I.R. Kondor, and K.M. Borgwardt. Graph Kernels. Journal of Machine Learning Research, 2009.
Google Scholar
N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In ICDM, pages 678–689. IEEE Computer Society, 2006.
Google Scholar
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of the Second IEEE International Conference on Data Mining (ICDM), pages 721–724. IEEE Press, 2002.
Google Scholar
X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286–295. ACM, 2003.
Google Scholar
B. Zenko, and S. Dzeroski. Learning Classification Rules for Multiple Target Attributes. In PAKDD, pages 454–465, 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Llandinam Building, Aberystwyth University, Ceredigion, SY23 3DB, Aberystwyth, United Kingdom
Ross D. King, Amanda Clare, Jem Rowland & Andrew Sparkes
2DEC, Poole House, Bournemouth University, BH12 5BB, Poole, Dorset, United Kingdom
Amanda Schierz
Departement Computerwetenschappen, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Siegfried Nijssen & Jan Ramon

Authors

Ross D. King
View author publications
You can also search for this author in PubMed Google Scholar
Amanda Schierz
View author publications
You can also search for this author in PubMed Google Scholar
Amanda Clare
View author publications
You can also search for this author in PubMed Google Scholar
Jem Rowland
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Sparkes
View author publications
You can also search for this author in PubMed Google Scholar
Siegfried Nijssen
View author publications
You can also search for this author in PubMed Google Scholar
Jan Ramon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ross D. King .

Editor information

Editors and Affiliations

, Department of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, Ljubljana, 1000, Slovenia
Sašo Džeroski
, Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, Antwerpen, B-2020, Belgium
Bart Goethals
, Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, Ljubljana, SI-1000, Slovenia
Panče Panov

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

King, R.D. et al. (2010). Inductive Queries for a Drug Designing Robot Scientist. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_18

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7738-0_18
Published: 18 November 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7737-3
Online ISBN: 978-1-4419-7738-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics