Skip to main content

Chemistry-Wise Augmentations for Molecule Graph Self-supervised Representation Learning

  • Conference paper
  • First Online:
Advances in Computational Intelligence (IWANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14135))

Included in the following conference series:

  • 427 Accesses

Abstract

In molecular property prediction tasks, graph neural networks have become a widely used tool. Recently, self-supervised learning frameworks, especially contrastive learning, gathered growing attention for the potential to learn molecular representations that generalize to the meaningful chemical space. Unlike supervised, self-supervised learning can directly leverage extensive unlabeled data, which significantly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most of them do not take into account the unique cheminformatics (e.g., molecular fingerprints) and multi-level molecular graph structures (e.g., functional groups).

In toxicity prediction tasks the molecule substructure can be crucial. Structure alerts (e.g. toxicophores) are studied pretty well and proven to be responsible for different types of toxicity. In this work, we propose chemistry-wise augmentations for a contrastive learning framework. Two augmentations were implemented: (1) toxicophore subgraph removal, and (2) toxicophore subgraph saving. This approach does not violate chemical principles while pushing the model to learn the toxicity-dependent parts of a molecule.

Experiments showed that novel augmentations are more efficient than the random subgraph masking approach usually used in molecular contrastive learning. The performance comparison with other GNN-based frameworks is carried out as well.

I. Makarov—The work of Ilya Makarov was made in the framework of the strategic project “Digital Business” within the Strategic Academic Leadership Program “Priority 2030” at NUST MISiS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fang, Y., et al.: Molecular contrastive learning with chemical element knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36 (4), pp. 3968–3976 (2022)

    Google Scholar 

  2. Gerasimova, O., Makarov, I.: Higher school of economics co-authorship network study. In: Proceedings of the 2nd IEEE International Conference on Computer Applications and Information Security (ICCAIS 2019), pp. 1–4. King Saud University, IEEE, New York (2019). https://doi.org/10.1109/CAIS.2019.8769556

  3. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: International Conference on Machine Learning, pp. 1263–1272. PMLR (2017)

    Google Scholar 

  4. Grachev, A.M., Ignatov, D.I., Savchenko, A.V.: Neural networks compression for language modeling. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, David, Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 351–357. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69900-4_44

    Chapter  Google Scholar 

  5. https://tripod.nih.gov/tox21/challenge/

  6. Kim, S., et al.: PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47(D1), D1102–D1109 (2019). https://doi.org/10.1093/nar/gky1033

    Article  PubMed  Google Scholar 

  7. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907, https://arxiv.org/abs/1609.02907 (2016)

  8. Li, S., Zhou, J., Xu, T., Dou, D., Xiong, H.: GeomGCL: geometric graph contrastive learning for molecular property prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36 (4), pp. 4541–4549 (2022)

    Google Scholar 

  9. Liu, S., Chandereng, T., Liang, Y.: N-gram graph, a novel molecule representation. CoRR abs/1806.09206, https://arxiv.org/abs/1806.09206 (2018)

  10. Lu, C., Liu, Q., Wang, C., Huang, Z., Lin, P., He, L.: Molecular property prediction: a multilevel quantum interactions modeling perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1052–1060 (2019). https://doi.org/10.1609/aaai.v33i01.33011052

  11. Makarov, I., Gerasimova, O.: Link prediction regression for weighted co-authorship networks. In: Proceedings of the 15th International Work-Conference on Artificial Neural Networks (IWANN 2019), pp. 667–677. Universitat Politecnica de Catalunya, Springer, Berlin (2019). https://doi.org/10.1007/978-3-030-20518-8_55

  12. Makarov, I., Gerasimova, O.: Predicting collaborations in co-authorship network. In: Proceedings of the 14th IEEE International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP 2019), pp. 1–6. Cyprus University of Technology, IEEE, New York (2019). https://doi.org/10.1109/SMAP.2019.8864887

  13. Makarov, I., Kiselev, D., Nikitinsky, N., Subelj, L.: Survey on graph embeddings and their applications to machine learning problems on graphs. PeerJ Comput. Sci. 7, e357 (2021). https://doi.org/10.7717/peerj-cs.357

    Article  PubMed  PubMed Central  Google Scholar 

  14. Makarov, I., Korovina, K., Kiselev, D.: JONNEE: joint network nodes and edges embedding. IEEE Access 9, 144646–144659 (2021). https://doi.org/10.1109/ACCESS.2021.3122100

    Article  Google Scholar 

  15. Makarov, I., Makarov, M., Kiselev, D.: Fusion of text and graph information for machine learning problems on networks. PeerJ Comput. Sci. 7(e526), 1–26 (2021). https://doi.org/10.7717/peerj-cs.526

    Article  Google Scholar 

  16. Makarov, I., et al.: Temporal network embedding framework with causal anonymous walks representations. PeerJ Comput. Sci. 8(e858), 1–27 (2022). https://doi.org/10.7717/peerj-cs.858

    Article  Google Scholar 

  17. Makarov, I., Savostyanov, D., Litvyakov, B., Ignatov, D.I.: Predicting winning team and probabilistic ratings in “dota 2” and “counter-strike: Global offensive” video games. In: Proceedings of the 6th International Conference on Analysis of Images, Social Networks and Texts (AIST 2017), pp. 183–196. LNCS, Polytechnic University, Springer, Berlin (2017). https://doi.org/10.1007/978-3-319-73013-4_17

  18. Savchenko, A.V.: Fast inference in convolutional neural networks based on sequential three-way decisions. Inf. Sci. 560, 370–385 (2021)

    Article  Google Scholar 

  19. Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015)

    Article  Google Scholar 

  20. Savchenko, A.V., Belova, N.S.: Unconstrained face identification using maximum likelihood of distances between deep off-the-shelf features. Expert Syst. Appl. 108, 170–182 (2018)

    Article  Google Scholar 

  21. Savchenko, A.V., Savchenko, L.V.: Towards the creation of reliable voice control system based on a fuzzy approach. Pattern Recogn. Lett. 65, 145–151 (2015)

    Article  Google Scholar 

  22. Schütt, K., Kindermans, P.J., Sauceda Felix, H.E., Chmiela, S., Tkatchenko, A., Müller, K.R.: SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  23. Singh, P.K., Negi, A., Gupta, P.K., Chauhan, M., Kumar, R.: Toxicophore exploration as a screening technology for drug design and discovery: techniques, scope and limitations. Arch. Toxicol. 90(8), 1785–1802 (2016). https://doi.org/10.1007/s00204-015-1587-5

    Article  CAS  PubMed  Google Scholar 

  24. Sun, F.Y., Hoffmann, J., Verma, V., Tang, J.: InfoGraph: unsupervised and Semi-supervised Graph-level Representation Learning Via Mutual Information Maximization. arXiv preprint arXiv:1908.01000 (2019)

  25. Sun, M., Xing, J., Wang, H., Chen, B., Zhou, J.: MoCL: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3585–3594. ACM, New York (2021). https://doi.org/10.1145/3447548.3467186

  26. Tikhomirova, K., Makarov, I.: Community detection based on the nodes role in a network: the telegram platform case. In: Proceedings of the 9th International Conference on Analysis of Images, Social Networks and Texts (AIST 2020), pp. 294–302. LNCS, Skoltech, Springer, Berlin (2020). https://doi.org/10.1007/978-3-030-72610-2_22

  27. Wang, X., Liu, N., Han, H., Shi, C.: Self-supervised heterogeneous graph neural network with co-contrastive learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1726–1736 (2021)

    Google Scholar 

  28. Wang, Y., Wang, J., Cao, Z., Barati Farimani, A.: Molecular contrastive learning of representations via graph neural networks. Nature Mach. Intell. 4(3), 279–287 (2022). https://doi.org/10.1038/s42256-022-00447-x

    Article  Google Scholar 

  29. Wieder, O., et al.: A compact review of molecular property prediction with graph neural networks. Drug Discov. Today Technol. 37, 1–12 (2020). https://doi.org/10.1016/j.ddtec.2020.11.009

    Article  PubMed  Google Scholar 

  30. Withnall, M., Lindelöf, E., Engkvist, O., Chen, H.: Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J. Cheminformatics 12(1), 1 (2020). https://doi.org/10.1186/s13321-019-0407-y

    Article  CAS  Google Scholar 

  31. Xiong, Z., et al.: Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63(16), 8749–8760 (2020). https://doi.org/10.1021/acs.jmedchem.9b00959

    Article  CAS  PubMed  Google Scholar 

  32. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks?. https://arxiv.org/abs/1810.00826 (2018)

  33. Yang, K., et al.: Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59(8), 3370–3388 (2019). https://doi.org/10.1021/acs.jcim.9b00237

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Yuning, Y., Tianlong, Ch., Yongduo, S., Ting, C., Zhangyang, W., Shen, Y.: Graph contrastive learning with augmentations. In: Lin, H.L., Ranzato, M., Hadsell, R., Balcan, M.F., H. (eds.) Advances in Neural Information Processing Systems, pp. 5812–5823. Curran Associates, Inc. (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilya Makarov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ondar, E., Makarov, I. (2023). Chemistry-Wise Augmentations for Molecule Graph Self-supervised Representation Learning. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2023. Lecture Notes in Computer Science, vol 14135. Springer, Cham. https://doi.org/10.1007/978-3-031-43078-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43078-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43077-0

  • Online ISBN: 978-3-031-43078-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics