BayesGrad: Explaining Predictions of Graph Convolutional Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11305)


Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: “how do we explain the predictions of graph convolutional networks?” A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the training data is often small and/or a label imbalance problem occurs, where a few samples belong to a single class and the majority of samples belong to the other classes. This can lead to uncertainty related to the learned parameters of the machine learning model. To address this uncertainty, we propose BayesGrad, utilizing the Bayesian predictive distribution, to define the importance of each node in an input graph, which is computed efficiently using the dropout technique. We demonstrate that BayesGrad successfully visualizes the substructures responsible for the label prediction in the artificial experiment, even when the sample size is small. Furthermore, we use a real dataset to evaluate the effectiveness of the visualization. The basic idea of BayesGrad is not limited to graph-structured data and can be applied to other data types.


Machine learning Deep learning Interpretability Cheminformatics Graph convolution 



This research was supported by JSPS KAKENHI Grant Number 15H01704, Japan.


  1. 1.
    Delaney, J.S.: ESOL: estimating aqueous solubility directly from molecular structure. J. Chem. Inf. Comput. Sci. 44(3), 1000–1005 (2004). pMID: 15154768CrossRefGoogle Scholar
  2. 2.
    Duvenaud, D.K., et al.: Convolutional networks on graphs for learning molecular fingerprints. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2224–2232 (2015)Google Scholar
  3. 3.
    Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML, pp. 1050–1059 (2016)Google Scholar
  4. 4.
    Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Proceedings of the 34th International Conference on Machine Learning, ICML, pp. 1263–1272 (2017)Google Scholar
  5. 5.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
  6. 6.
    Huang, R., et al.: Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 3, 85 (2016)CrossRefGoogle Scholar
  7. 7.
    Kearnes, S., McCloskey, K., Berndl, M., Pande, V., Riley, P.: Molecular graph convolutions: moving beyond fingerprints. J. Comput.-Aided Mol. Des. 30(8), 595–608 (2016)CrossRefGoogle Scholar
  8. 8.
    Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: Proceedings of the International Conference on Learning Representations, ICLR (2016)Google Scholar
  9. 9.
    Maeda, S.: A Bayesian encourages dropout. arXiv preprint arXiv:1412.7003 (2014)
  10. 10.
    pfnet research: chainer-chemistry.
  11. 11.
    Schütt, K., Kindermans, P.J., Felix, H.E.S., Chmiela, S., Tkatchenko, A., Müller, K.R.: SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems, vol. 30, pp. 992–1002. Curran Associates, Inc. (2017)Google Scholar
  12. 12.
    Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017)
  13. 13.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  14. 14.
    Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
  15. 15.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Terada, H., Fukui, Y., Shinohara, Y., Ju-ichi, M.: Unique action of a modified weakly acidic uncoupler without an acidic group, methylated SF 6847, as an inhibitor of oxidative phosphorylation with no uncoupling activity: possible identity of uncoupler binding protein. Biochimica et Biophysica Acta 933, 193–199 (1988)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Kyoto UniversityKyotoJapan
  2. 2.Preferred Networks, Inc.TokyoJapan
  3. 3.University of TsukubaTsukubaJapan

Personalised recommendations