Skip to main content

Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2021)

Abstract

We consider the following problem: A researcher identified a small number of molecules with a certain property of interest and now wants to find further molecules sharing this property in a database. This can be described as learning molecular classes from small numbers of positive examples. In this work, we propose a method that is based on learning a graph grammar for the molecular class. We consider the type of graph grammars proposed by Althaus et al. [2], as it can be easily interpreted and allows relatively efficient queries. We identify rules that are frequently encountered in the positive examples and use these to construct a graph grammar. We then classify a molecule as being contained in the class if it matches the computed graph grammar. We analyzed our method on different known groups of molecules defined by structural properties and show that our method achieves low false-negative and low false-positive rates.

Supported by the German science foundation (DFG, project number 416768284).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://zinc.docking.org/subsets/all-purchasable.

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org

  2. Althaus, E., Hildebrandt, A., Mosca, D.: Graph rewriting based search for molecular structures: definitions, algorithms, hardness. In: Software Technologies: Applications and Foundations - STAF 2017 Collocated Workshops, Marburg, Germany, 17–21 July 2017, Revised Selected Papers, pp. 43–59 (2017). https://doi.org/10.1007/978-3-319-74730-9_5, https://doi.org/10.1007/978-3-319-74730-9_5

  3. Bajusz, D., Rácz, A., Héberger, K.: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7(1), 20 (2015). https://doi.org/10.1186/s13321-015-0069-3

    Article  Google Scholar 

  4. Friesner, R.A., et al.: Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 49(21), 6177–6196 (2006). https://doi.org/10.1021/jm051256o

    Article  Google Scholar 

  5. Gohlke, H., Klebe, G.: Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angewandte Chemie Int. Ed. 41(15), 2644–2676 (2002). https://doi.org/10.1002/1521-3773, https://onlinelibrary.wiley.com/doi/abs/10.1002/1521-3773

  6. Hirohara, M., Saito, Y., Koda, Y., Sato, K., Sakakibara, Y.: Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform. 19(Suppl 19), 526–526 (2018). https://doi.org/10.1186/s12859-018-2523-5, https://www.ncbi.nlm.nih.gov/pubmed/30598075

  7. Hoffmann, T., Gastreich, M.: The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov. Today 24(5), 1148–1156 (2019). https://doi.org/10.1016/j.drudis.2019.02.013, http://www.sciencedirect.com/science/article/pii/S1359644618304471

  8. Kim, S., et al.: Pubchem substance and compound databases. Nucleic Acids Res. 44(D1), D1202–D1213 (2016)

    Article  Google Scholar 

  9. National Center for Advancing Translational Sciences (NCATS): Tox21 data challenge 2014 (2014). https://tripod.nih.gov/tox21/challenge/

  10. O’Boyle, N., Dalke, A.: DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures (2018). https://doi.org/10.26434/chemrxiv.7097960.v1, https://chemrxiv.org/articles/DeepSMILES_An_Adaptation_of_SMILES_for_Use_in_Machine-Learning_of_Chemical_Structures/7097960

  11. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Rogers, D.J., Tanimoto, T.T.: A computer program for classifying plants. Science 132(3434), 1115–1118 (1960). https://doi.org/10.1126/science.132.3434.1115, https://science.sciencemag.org/content/132/3434/1115

  13. Schellhammer, I., Rarey, M.: FlexX-Scan: fast, structure-based virtual screening. Proteins Structure, Funct. Bioinform. 57(3), 504–517 (2004). https://doi.org/10.1002/prot.20217, https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.20217

  14. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215

  15. Xiang, M., Cao, Y., Fan, W., Chen, L., Mo, Y.: Computer-aided drug design: lead discovery and optimization. Comb. Chem. High Throughput Screening 15, 328–37 (2012). https://doi.org/10.2174/138620712799361825

    Article  Google Scholar 

  16. Šípek, V., Holubová, I., Svoboda, M.: Comparison of approaches for querying chemical compounds. In: Gadepally, V., et al. (eds.) DMAH/Poly -2019. LNCS, vol. 11721, pp. 204–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33752-0_15

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ernst Althaus , Andreas Hildebrandt or Domenico Mosca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Althaus, E., Hildebrandt, A., Mosca, D. (2021). Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars. In: Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T. (eds) Algorithms for Computational Biology. AlCoB 2021. Lecture Notes in Computer Science(), vol 12715. Springer, Cham. https://doi.org/10.1007/978-3-030-74432-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74432-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74431-1

  • Online ISBN: 978-3-030-74432-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics