Ontology Pre-training for Poison Prediction

Glauer, Martin; Neuhaus, Fabian; Mossakowski, Till; Hastings, Janna

doi:10.1007/978-3-031-42608-7_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14236))

Included in the following conference series:

German Conference on Artificial Intelligence (Künstliche Intelligenz)

687 Accesses

Abstract

Integrating human knowledge into neural networks has the potential to improve their robustness and interpretability. We have developed a novel approach to integrate knowledge from ontologies into the structure of a Transformer network which we call ontology pre-training: we train the network to predict membership in ontology classes as a way to embed the structure of the ontology into the network, and subsequently fine-tune the network for the particular prediction task. We apply this approach to a case study in predicting the potential toxicity of a small molecule based on its molecular structure, a challenging task for machine learning in life sciences chemistry. Our approach improves on the state of the art, and moreover has several additional benefits. First, we are able to show that the model learns to focus attention on more meaningful chemical groups when making predictions with ontology pre-training than without, paving a path towards greater robustness and interpretability. Second, the training time is reduced after ontology pre-training, indicating that the model is better placed to learn what matters for toxicity prediction with the ontology pre-training than without. This strategy has general applicability as a neuro-symbolic approach to embed meaningful semantics into neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Article Open access 16 March 2021

Learning Description Logic Ontologies: Five Approaches. Where Do They Stand?

Article Open access 22 April 2020

Mixing Logic Programming and Neural Networks to Support Neurological Disorders Analysis

Notes

References

van Bekkum, M., de Boer, M., van Harmelen, F., Meyer-Vitali, A., Teije, A.T.: Modular design patterns for hybrid learning and reasoning systems. Appl. Intell. 51(9), 6528–6546 (2021)
Article Google Scholar
Cavasotto, C.N., Scardino, V.: Machine learning toxicity prediction: latest advances by toxicity end point. ACS Omega 7(51), 47536–47546 (2022). https://doi.org/10.1021/acsomega.2c05693
Article Google Scholar
Chen, J., Hu, P., Jimenez-Ruiz, E., Holter, O.M., Antonyrajah, D., Horrocks, I.: OWL2Vec*: embedding of OWL ontologies. Mach. Learn. 110(7), 1813–1845 (2021). https://doi.org/10.1007/s10994-021-05997-6
Article MathSciNet MATH Google Scholar
Chen, J., Si, Y.-W., Un, C.-W., Siu, S.W.I.: Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network. J. Cheminform. 13(1), 1–16 (2021). https://doi.org/10.1186/s13321-021-00570-8
Article Google Scholar
Chithrananda, S., Grand, G., Ramsundar, B.: Chemberta: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 (2020)
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Flügel, S., Glauer, M., Neuhaus, F., Hastings, J.: When one logic is not enough: integrating first-order annotations in OWL ontologies. Semant. Web J. (2023). http://www.semantic-web-journal.net/content/when-one-logic-not-enough-integrating-first-order-annotations-owl-ontologies
Glauer, M., Memariani, A., Neuhaus, F., Mossakowski, T., Hastings, J.: Interpretable Ontology Extension in Chemistry. Semant. Web J. (2022). https://doi.org/10.5281/ZENODO.6023497. https://zenodo.org/record/6023497
Hastings, J.: Primer on ontologies. In: Dessimoz, C., Škunca, N. (eds.) The Gene Ontology Handbook. MMB, vol. 1446, pp. 3–13. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-3743-1_1
Chapter Google Scholar
Hastings, J., Glauer, M., Memariani, A., Neuhaus, F., Mossakowski, T.: Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification. J. Cheminform. 13(23) (2021). https://doi.org/10.21203/rs.3.rs-107431/v1
Hastings, J., et al.: ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44(D1), D1214–D1219 (2016). https://doi.org/10.1093/nar/gkv1031
Article Google Scholar
Huang, R., et al.: Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 3 (2016). https://www.frontiersin.org/articles/10.3389/fenvs.2015.00085
Idakwo, G.: Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets. J. Cheminform. 12(1), 1–19 (2020)
Article Google Scholar
Jiang, J., Wang, R., Wei, G.W.: GGL-Tox: geometric graph learning for toxicity prediction. J. Chem. Inf. Model. 61(4), 1691–1700 (2021). https://doi.org/10.1021/acs.jcim.0c01294
Article Google Scholar
Jumper, J., et al.: Highly accurate protein structure prediction with AlphaFold. Nature 1–11 (2021). https://doi.org/10.1038/s41586-021-03819-2. https://www.nature.com/articles/s41586-021-03819-2
Kulmanov, M., Hoehndorf, R.: DeepPheno: predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier. PLoS Comput. Biol. 16(11) (2020). https://doi.org/10.1371/journal.pcbi.1008453
Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: DeepTox: toxicity prediction using deep learning. Front. Environ. Sci. 3 (2016). https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080
Neuhaus, F., Hastings, J.: Ontology development is consensus creation, not (merely) representation. Appl. Ontol. 17(4), 495–513 (2022). https://doi.org/10.3233/AO-220273
Article Google Scholar
Peng, Y., Zhang, Z., Jiang, Q., Guan, J., Zhou, S.: TOP: a deep mixture representation learning method for boosting molecular toxicity prediction. Methods 179, 55–64 (2020). https://doi.org/10.1016/j.ymeth.2020.05.013. https://www.sciencedirect.com/science/article/pii/S1046202320300888
Riegel, R., et al.: Logical neural networks. arXiv preprint arXiv:2006.13155 (2020)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2022)
Google Scholar
von Rueden, L., et al.: Informed machine learning - a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Trans. Knowl. Data Eng. 35(1), 614–633 (2021). https://doi.org/10.1109/TKDE.2021.3079836
Article Google Scholar
Sahoo, S.S., et al.: Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Sci. Rep. 12(1), 19430 (2022). https://doi.org/10.1038/s41598-022-23101-3. https://www.nature.com/articles/s41598-022-23101-3
Sayers, E.: PubChem: An Entrez Database of Small Molecules. NLM Tech. Bull. 2005 Jan-Feb(342:e2) (2005)
Google Scholar
Smaili, F.Z., Gao, X., Hoehndorf, R.: OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 35(12), 2133–2140 (2019). https://doi.org/10.1093/bioinformatics/bty933. https://academic.oup.com/bioinformatics/article/35/12/2133/5165380
Vig, J., Madani, A., Varshney, L.R., Xiong, C., Socher, R., Rajani, N.F.: BERTology Meets Biology: Interpreting Attention in Protein Language Models. arXiv:2006.15222 (2021). http://arxiv.org/abs/2006.15222
Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988). https://doi.org/10.1021/ci00057a005
Wu, Z., et al.: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)
Article Google Scholar
Yang, H., Sun, L., Li, W., Liu, G., Tang, Y.: In silico prediction of chemical toxicity for drug design using machine learning methods and structural alerts. Front. Chem. 6 (2018). https://www.frontiersin.org/articles/10.3389/fchem.2018.00030
Zha, Y., et al.: Ontology-aware deep learning enables ultrafast and interpretable source tracking among sub-million microbial community samples from hundreds of niches. Genome Med. 14(1), 43 (2022). https://doi.org/10.1186/s13073-022-01047-5
Article Google Scholar
Zhang, N., et al.: Ontoprotein: protein pretraining with gene ontology embedding. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=yfe1VMYAXa4

Download references

Author information

Authors and Affiliations

Otto von Guericke University Magdeburg Universitätsplatz 2, 39106, Magdeburg, Germany
Martin Glauer, Fabian Neuhaus & Till Mossakowski
University of Zurich, Rämistrasse 71, 8006, Zürich, Switzerland
Janna Hastings

Authors

Martin Glauer
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Neuhaus
View author publications
You can also search for this author in PubMed Google Scholar
Till Mossakowski
View author publications
You can also search for this author in PubMed Google Scholar
Janna Hastings
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Till Mossakowski .

Editor information

Editors and Affiliations

Universität Würzburg, Würzburg, Germany
Dietmar Seipel
University of Greifswald, Greifswald, Germany
Alexander Steen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Glauer, M., Neuhaus, F., Mossakowski, T., Hastings, J. (2023). Ontology Pre-training for Poison Prediction. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. https://doi.org/10.1007/978-3-031-42608-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-42608-7_4
Published: 18 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42607-0
Online ISBN: 978-3-031-42608-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ontology Pre-training for Poison Prediction

Abstract

Access this chapter

Similar content being viewed by others

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Learning Description Logic Ontologies: Five Approaches. Where Do They Stand?

Mixing Logic Programming and Neural Networks to Support Neurological Disorders Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Ontology Pre-training for Poison Prediction

Abstract

Access this chapter

Similar content being viewed by others

Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification

Learning Description Logic Ontologies: Five Approaches. Where Do They Stand?

Mixing Logic Programming and Neural Networks to Support Neurological Disorders Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation