Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars

Althaus, Ernst; Hildebrandt, Andreas; Mosca, Domenico

doi:10.1007/978-3-030-74432-8_1

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12715))

Included in the following conference series:

International Conference on Algorithms for Computational Biology

414 Accesses

Abstract

We consider the following problem: A researcher identified a small number of molecules with a certain property of interest and now wants to find further molecules sharing this property in a database. This can be described as learning molecular classes from small numbers of positive examples. In this work, we propose a method that is based on learning a graph grammar for the molecular class. We consider the type of graph grammars proposed by Althaus et al. [2], as it can be easily interpreted and allows relatively efficient queries. We identify rules that are frequently encountered in the positive examples and use these to construct a graph grammar. We then classify a molecule as being contained in the class if it matches the computed graph grammar. We analyzed our method on different known groups of molecules defined by structural properties and show that our method achieves low false-negative and low false-positive rates.

Supported by the German science foundation (DFG, project number 416768284).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://zinc.docking.org/subsets/all-purchasable.

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org
Althaus, E., Hildebrandt, A., Mosca, D.: Graph rewriting based search for molecular structures: definitions, algorithms, hardness. In: Software Technologies: Applications and Foundations - STAF 2017 Collocated Workshops, Marburg, Germany, 17–21 July 2017, Revised Selected Papers, pp. 43–59 (2017). https://doi.org/10.1007/978-3-319-74730-9_5, https://doi.org/10.1007/978-3-319-74730-9_5
Bajusz, D., Rácz, A., Héberger, K.: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7(1), 20 (2015). https://doi.org/10.1186/s13321-015-0069-3
Article Google Scholar
Friesner, R.A., et al.: Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 49(21), 6177–6196 (2006). https://doi.org/10.1021/jm051256o
Article Google Scholar
Gohlke, H., Klebe, G.: Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angewandte Chemie Int. Ed. 41(15), 2644–2676 (2002). https://doi.org/10.1002/1521-3773, https://onlinelibrary.wiley.com/doi/abs/10.1002/1521-3773
Hirohara, M., Saito, Y., Koda, Y., Sato, K., Sakakibara, Y.: Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform. 19(Suppl 19), 526–526 (2018). https://doi.org/10.1186/s12859-018-2523-5, https://www.ncbi.nlm.nih.gov/pubmed/30598075
Hoffmann, T., Gastreich, M.: The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov. Today 24(5), 1148–1156 (2019). https://doi.org/10.1016/j.drudis.2019.02.013, http://www.sciencedirect.com/science/article/pii/S1359644618304471
Kim, S., et al.: Pubchem substance and compound databases. Nucleic Acids Res. 44(D1), D1202–D1213 (2016)
Article Google Scholar
National Center for Advancing Translational Sciences (NCATS): Tox21 data challenge 2014 (2014). https://tripod.nih.gov/tox21/challenge/
O’Boyle, N., Dalke, A.: DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures (2018). https://doi.org/10.26434/chemrxiv.7097960.v1, https://chemrxiv.org/articles/DeepSMILES_An_Adaptation_of_SMILES_for_Use_in_Machine-Learning_of_Chemical_Structures/7097960
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rogers, D.J., Tanimoto, T.T.: A computer program for classifying plants. Science 132(3434), 1115–1118 (1960). https://doi.org/10.1126/science.132.3434.1115, https://science.sciencemag.org/content/132/3434/1115
Schellhammer, I., Rarey, M.: FlexX-Scan: fast, structure-based virtual screening. Proteins Structure, Funct. Bioinform. 57(3), 504–517 (2004). https://doi.org/10.1002/prot.20217, https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.20217
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
Xiang, M., Cao, Y., Fan, W., Chen, L., Mo, Y.: Computer-aided drug design: lead discovery and optimization. Comb. Chem. High Throughput Screening 15, 328–37 (2012). https://doi.org/10.2174/138620712799361825
Article Google Scholar
Šípek, V., Holubová, I., Svoboda, M.: Comparison of approaches for querying chemical compounds. In: Gadepally, V., et al. (eds.) DMAH/Poly -2019. LNCS, vol. 11721, pp. 204–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33752-0_15
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9, 55128, Mainz, Germany
Ernst Althaus, Andreas Hildebrandt & Domenico Mosca

Authors

Ernst Althaus
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Hildebrandt
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Mosca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ernst Althaus , Andreas Hildebrandt or Domenico Mosca .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
University of Extremadura, Cáceres, Spain
Miguel A. Vega-Rodríguez
University of Montana, Missoula, MT, USA
Travis Wheeler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Althaus, E., Hildebrandt, A., Mosca, D. (2021). Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars. In: Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T. (eds) Algorithms for Computational Biology. AlCoB 2021. Lecture Notes in Computer Science(), vol 12715. Springer, Cham. https://doi.org/10.1007/978-3-030-74432-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-74432-8_1
Published: 31 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74431-1
Online ISBN: 978-3-030-74432-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics