BayesGrad: Explaining Predictions of Graph Convolutional Networks
Recent advances in graph convolutional networks have significantly improved the performance of chemical predictions, raising a new research question: “how do we explain the predictions of graph convolutional networks?” A possible approach to answer this question is to visualize evidence substructures responsible for the predictions. For chemical property prediction tasks, the sample size of the training data is often small and/or a label imbalance problem occurs, where a few samples belong to a single class and the majority of samples belong to the other classes. This can lead to uncertainty related to the learned parameters of the machine learning model. To address this uncertainty, we propose BayesGrad, utilizing the Bayesian predictive distribution, to define the importance of each node in an input graph, which is computed efficiently using the dropout technique. We demonstrate that BayesGrad successfully visualizes the substructures responsible for the label prediction in the artificial experiment, even when the sample size is small. Furthermore, we use a real dataset to evaluate the effectiveness of the visualization. The basic idea of BayesGrad is not limited to graph-structured data and can be applied to other data types.
KeywordsMachine learning Deep learning Interpretability Cheminformatics Graph convolution
This research was supported by JSPS KAKENHI Grant Number 15H01704, Japan.
- 2.Duvenaud, D.K., et al.: Convolutional networks on graphs for learning molecular fingerprints. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2224–2232 (2015)Google Scholar
- 3.Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML, pp. 1050–1059 (2016)Google Scholar
- 4.Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Proceedings of the 34th International Conference on Machine Learning, ICML, pp. 1263–1272 (2017)Google Scholar
- 5.Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
- 8.Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: Proceedings of the International Conference on Learning Representations, ICLR (2016)Google Scholar
- 9.Maeda, S.: A Bayesian encourages dropout. arXiv preprint arXiv:1412.7003 (2014)
- 10.pfnet research: chainer-chemistry. https://github.com/pfnet-research/chainer-chemistry
- 11.Schütt, K., Kindermans, P.J., Felix, H.E.S., Chmiela, S., Tkatchenko, A., Müller, K.R.: SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems, vol. 30, pp. 992–1002. Curran Associates, Inc. (2017)Google Scholar
- 12.Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017)
- 13.Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
- 14.Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
- 16.Terada, H., Fukui, Y., Shinohara, Y., Ju-ichi, M.: Unique action of a modified weakly acidic uncoupler without an acidic group, methylated SF 6847, as an inhibitor of oxidative phosphorylation with no uncoupling activity: possible identity of uncoupler binding protein. Biochimica et Biophysica Acta 933, 193–199 (1988)CrossRefGoogle Scholar