Abstract
We consider the following problem: A researcher identified a small number of molecules with a certain property of interest and now wants to find further molecules sharing this property in a database. This can be described as learning molecular classes from small numbers of positive examples. In this work, we propose a method that is based on learning a graph grammar for the molecular class. We consider the type of graph grammars proposed by Althaus et al. [2], as it can be easily interpreted and allows relatively efficient queries. We identify rules that are frequently encountered in the positive examples and use these to construct a graph grammar. We then classify a molecule as being contained in the class if it matches the computed graph grammar. We analyzed our method on different known groups of molecules defined by structural properties and show that our method achieves low false-negative and low false-positive rates.
Supported by the German science foundation (DFG, project number 416768284).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org
Althaus, E., Hildebrandt, A., Mosca, D.: Graph rewriting based search for molecular structures: definitions, algorithms, hardness. In: Software Technologies: Applications and Foundations - STAF 2017 Collocated Workshops, Marburg, Germany, 17–21 July 2017, Revised Selected Papers, pp. 43–59 (2017). https://doi.org/10.1007/978-3-319-74730-9_5, https://doi.org/10.1007/978-3-319-74730-9_5
Bajusz, D., Rácz, A., Héberger, K.: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7(1), 20 (2015). https://doi.org/10.1186/s13321-015-0069-3
Friesner, R.A., et al.: Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 49(21), 6177–6196 (2006). https://doi.org/10.1021/jm051256o
Gohlke, H., Klebe, G.: Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angewandte Chemie Int. Ed. 41(15), 2644–2676 (2002). https://doi.org/10.1002/1521-3773, https://onlinelibrary.wiley.com/doi/abs/10.1002/1521-3773
Hirohara, M., Saito, Y., Koda, Y., Sato, K., Sakakibara, Y.: Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform. 19(Suppl 19), 526–526 (2018). https://doi.org/10.1186/s12859-018-2523-5, https://www.ncbi.nlm.nih.gov/pubmed/30598075
Hoffmann, T., Gastreich, M.: The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov. Today 24(5), 1148–1156 (2019). https://doi.org/10.1016/j.drudis.2019.02.013, http://www.sciencedirect.com/science/article/pii/S1359644618304471
Kim, S., et al.: Pubchem substance and compound databases. Nucleic Acids Res. 44(D1), D1202–D1213 (2016)
National Center for Advancing Translational Sciences (NCATS): Tox21 data challenge 2014 (2014). https://tripod.nih.gov/tox21/challenge/
O’Boyle, N., Dalke, A.: DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures (2018). https://doi.org/10.26434/chemrxiv.7097960.v1, https://chemrxiv.org/articles/DeepSMILES_An_Adaptation_of_SMILES_for_Use_in_Machine-Learning_of_Chemical_Structures/7097960
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Rogers, D.J., Tanimoto, T.T.: A computer program for classifying plants. Science 132(3434), 1115–1118 (1960). https://doi.org/10.1126/science.132.3434.1115, https://science.sciencemag.org/content/132/3434/1115
Schellhammer, I., Rarey, M.: FlexX-Scan: fast, structure-based virtual screening. Proteins Structure, Funct. Bioinform. 57(3), 504–517 (2004). https://doi.org/10.1002/prot.20217, https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.20217
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
Xiang, M., Cao, Y., Fan, W., Chen, L., Mo, Y.: Computer-aided drug design: lead discovery and optimization. Comb. Chem. High Throughput Screening 15, 328–37 (2012). https://doi.org/10.2174/138620712799361825
Å Ãpek, V., Holubová, I., Svoboda, M.: Comparison of approaches for querying chemical compounds. In: Gadepally, V., et al. (eds.) DMAH/Poly -2019. LNCS, vol. 11721, pp. 204–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33752-0_15
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Althaus, E., Hildebrandt, A., Mosca, D. (2021). Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars. In: MartÃn-Vide, C., Vega-RodrÃguez, M.A., Wheeler, T. (eds) Algorithms for Computational Biology. AlCoB 2021. Lecture Notes in Computer Science(), vol 12715. Springer, Cham. https://doi.org/10.1007/978-3-030-74432-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-74432-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74431-1
Online ISBN: 978-3-030-74432-8
eBook Packages: Computer ScienceComputer Science (R0)