Skip to main content

Ontology Pre-training for Poison Prediction

  • Conference paper
  • First Online:
KI 2023: Advances in Artificial Intelligence (KI 2023)

Abstract

Integrating human knowledge into neural networks has the potential to improve their robustness and interpretability. We have developed a novel approach to integrate knowledge from ontologies into the structure of a Transformer network which we call ontology pre-training: we train the network to predict membership in ontology classes as a way to embed the structure of the ontology into the network, and subsequently fine-tune the network for the particular prediction task. We apply this approach to a case study in predicting the potential toxicity of a small molecule based on its molecular structure, a challenging task for machine learning in life sciences chemistry. Our approach improves on the state of the art, and moreover has several additional benefits. First, we are able to show that the model learns to focus attention on more meaningful chemical groups when making predictions with ontology pre-training than without, paving a path towards greater robustness and interpretability. Second, the training time is reduced after ontology pre-training, indicating that the model is better placed to learn what matters for toxicity prediction with the ontology pre-training than without. This strategy has general applicability as a neuro-symbolic approach to embed meaningful semantics into neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/MGlauer/ChEBai.

  2. 2.

    https://doi.org/10.5281/zenodo.7548313.

References

  1. van Bekkum, M., de Boer, M., van Harmelen, F., Meyer-Vitali, A., Teije, A.T.: Modular design patterns for hybrid learning and reasoning systems. Appl. Intell. 51(9), 6528–6546 (2021)

    Article  Google Scholar 

  2. Cavasotto, C.N., Scardino, V.: Machine learning toxicity prediction: latest advances by toxicity end point. ACS Omega 7(51), 47536–47546 (2022). https://doi.org/10.1021/acsomega.2c05693

    Article  Google Scholar 

  3. Chen, J., Hu, P., Jimenez-Ruiz, E., Holter, O.M., Antonyrajah, D., Horrocks, I.: OWL2Vec*: embedding of OWL ontologies. Mach. Learn. 110(7), 1813–1845 (2021). https://doi.org/10.1007/s10994-021-05997-6

    Article  MathSciNet  MATH  Google Scholar 

  4. Chen, J., Si, Y.-W., Un, C.-W., Siu, S.W.I.: Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network. J. Cheminform. 13(1), 1–16 (2021). https://doi.org/10.1186/s13321-021-00570-8

    Article  Google Scholar 

  5. Chithrananda, S., Grand, G., Ramsundar, B.: Chemberta: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 (2020)

  6. Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)

  7. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  8. Flügel, S., Glauer, M., Neuhaus, F., Hastings, J.: When one logic is not enough: integrating first-order annotations in OWL ontologies. Semant. Web J. (2023). http://www.semantic-web-journal.net/content/when-one-logic-not-enough-integrating-first-order-annotations-owl-ontologies

  9. Glauer, M., Memariani, A., Neuhaus, F., Mossakowski, T., Hastings, J.: Interpretable Ontology Extension in Chemistry. Semant. Web J. (2022). https://doi.org/10.5281/ZENODO.6023497. https://zenodo.org/record/6023497

  10. Hastings, J.: Primer on ontologies. In: Dessimoz, C., Škunca, N. (eds.) The Gene Ontology Handbook. MMB, vol. 1446, pp. 3–13. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-3743-1_1

    Chapter  Google Scholar 

  11. Hastings, J., Glauer, M., Memariani, A., Neuhaus, F., Mossakowski, T.: Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification. J. Cheminform. 13(23) (2021). https://doi.org/10.21203/rs.3.rs-107431/v1

  12. Hastings, J., et al.: ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44(D1), D1214–D1219 (2016). https://doi.org/10.1093/nar/gkv1031

    Article  Google Scholar 

  13. Huang, R., et al.: Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 3 (2016). https://www.frontiersin.org/articles/10.3389/fenvs.2015.00085

  14. Idakwo, G.: Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets. J. Cheminform. 12(1), 1–19 (2020)

    Article  Google Scholar 

  15. Jiang, J., Wang, R., Wei, G.W.: GGL-Tox: geometric graph learning for toxicity prediction. J. Chem. Inf. Model. 61(4), 1691–1700 (2021). https://doi.org/10.1021/acs.jcim.0c01294

    Article  Google Scholar 

  16. Jumper, J., et al.: Highly accurate protein structure prediction with AlphaFold. Nature 1–11 (2021). https://doi.org/10.1038/s41586-021-03819-2. https://www.nature.com/articles/s41586-021-03819-2

  17. Kulmanov, M., Hoehndorf, R.: DeepPheno: predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier. PLoS Comput. Biol. 16(11) (2020). https://doi.org/10.1371/journal.pcbi.1008453

  18. Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: DeepTox: toxicity prediction using deep learning. Front. Environ. Sci. 3 (2016). https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080

  19. Neuhaus, F., Hastings, J.: Ontology development is consensus creation, not (merely) representation. Appl. Ontol. 17(4), 495–513 (2022). https://doi.org/10.3233/AO-220273

    Article  Google Scholar 

  20. Peng, Y., Zhang, Z., Jiang, Q., Guan, J., Zhou, S.: TOP: a deep mixture representation learning method for boosting molecular toxicity prediction. Methods 179, 55–64 (2020). https://doi.org/10.1016/j.ymeth.2020.05.013. https://www.sciencedirect.com/science/article/pii/S1046202320300888

  21. Riegel, R., et al.: Logical neural networks. arXiv preprint arXiv:2006.13155 (2020)

  22. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2022)

    Google Scholar 

  23. von Rueden, L., et al.: Informed machine learning - a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Trans. Knowl. Data Eng. 35(1), 614–633 (2021). https://doi.org/10.1109/TKDE.2021.3079836

    Article  Google Scholar 

  24. Sahoo, S.S., et al.: Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Sci. Rep. 12(1), 19430 (2022). https://doi.org/10.1038/s41598-022-23101-3. https://www.nature.com/articles/s41598-022-23101-3

  25. Sayers, E.: PubChem: An Entrez Database of Small Molecules. NLM Tech. Bull. 2005 Jan-Feb(342:e2) (2005)

    Google Scholar 

  26. Smaili, F.Z., Gao, X., Hoehndorf, R.: OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 35(12), 2133–2140 (2019). https://doi.org/10.1093/bioinformatics/bty933. https://academic.oup.com/bioinformatics/article/35/12/2133/5165380

  27. Vig, J., Madani, A., Varshney, L.R., Xiong, C., Socher, R., Rajani, N.F.: BERTology Meets Biology: Interpreting Attention in Protein Language Models. arXiv:2006.15222 (2021). http://arxiv.org/abs/2006.15222

  28. Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988). https://doi.org/10.1021/ci00057a005

  29. Wu, Z., et al.: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)

    Article  Google Scholar 

  30. Yang, H., Sun, L., Li, W., Liu, G., Tang, Y.: In silico prediction of chemical toxicity for drug design using machine learning methods and structural alerts. Front. Chem. 6 (2018). https://www.frontiersin.org/articles/10.3389/fchem.2018.00030

  31. Zha, Y., et al.: Ontology-aware deep learning enables ultrafast and interpretable source tracking among sub-million microbial community samples from hundreds of niches. Genome Med. 14(1), 43 (2022). https://doi.org/10.1186/s13073-022-01047-5

    Article  Google Scholar 

  32. Zhang, N., et al.: Ontoprotein: protein pretraining with gene ontology embedding. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=yfe1VMYAXa4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Till Mossakowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Glauer, M., Neuhaus, F., Mossakowski, T., Hastings, J. (2023). Ontology Pre-training for Poison Prediction. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. https://doi.org/10.1007/978-3-031-42608-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42608-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42607-0

  • Online ISBN: 978-3-031-42608-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics