Advertisement

Interpreting Layered Neural Networks via Hierarchical Modular Representation

Conference paper
  • 1.2k Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1143)

Abstract

Interpreting the prediction mechanism of complex models is currently one of the most important tasks in the machine learning field, especially with layered neural networks, which have achieved high predictive performance with various practical data sets. To reveal the global structure of a trained neural network in an interpretable way, a series of clustering methods have been proposed, which decompose the units into clusters according to the similarity of their inference roles. The main problems in these studies were that (1) we have no prior knowledge about the optimal resolution for the decomposition, or the appropriate number of clusters, and (2) there was no method for acquiring knowledge about whether the outputs of each cluster have a positive or negative correlation with the input and output unit values. In this paper, to solve these problems, we propose a method for obtaining a hierarchical modular representation of a layered neural network. The application of a hierarchical clustering method to a trained network reveals a tree-structured relationship among hidden layer units, based on their feature vectors defined by their correlation with the input and output unit values.

Keywords

Interpretable machine learning Neural network Hierarchical clustering 

References

  1. 1.
    Alain, G., Bengio, Y.: Understanding intermediate layers using linear classifier probes. In: ICLR 2017 Workshop (2017)Google Scholar
  2. 2.
    Ancona, M., Ceolini, E., Öztireli, A.C., Gross, M.: Towards better understanding of gradient-based attribution methods for deep neural networks. In: International Conference on Learning Representations (2018)Google Scholar
  3. 3.
    Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Computer Vision and Pattern Recognition (2017)Google Scholar
  4. 4.
    Craven, M., Shavlik, J.W.: Extracting tree-structured representations of trained networks. In: Advances in Neural Information Processing Systems, vol. 8, pp. 24–30 (1996)Google Scholar
  5. 5.
    Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
  6. 6.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)Google Scholar
  7. 7.
    Krishnan, R., Sivakumar, G., Bhattacharya, P.: Extracting decision trees from trained neural networks. Pattern Recogn. 32(12), 1999–2009 (1999)CrossRefGoogle Scholar
  8. 8.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE. 86, 2278–2324 (1998)CrossRefGoogle Scholar
  9. 9.
    Lipton, Z.C.: The mythos of model interpretability. In: Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (2016)Google Scholar
  10. 10.
    Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30, pp. 4765–4774 (2017)Google Scholar
  11. 11.
    Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4898–4906 (2016)Google Scholar
  12. 12.
    Nagamine, T., Mesgarani, N.: Understanding the representation and computation of multilayer perceptrons: a case study in speech recognition. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2564–2573 (2017)Google Scholar
  13. 13.
    Raghu, M., Gilmer, J., Yosinski, J., Sohl-Dickstein, J.: SVCCA: singular vector canonical correlation analysis for deep learning dynamics and interpretability. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6076–6085 (2017)Google Scholar
  14. 14.
    Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)Google Scholar
  15. 15.
    Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, pp. 3145–3153 (2017)Google Scholar
  16. 16.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR 2014 Workshop (2014)Google Scholar
  17. 17.
    Singh, C., Murdoch, W.J., Yu, B.: Hierarchical interpretations for neural network predictions. In: International Conference on Learning Representations (2019)Google Scholar
  18. 18.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: ICLR 2015 Workshop (2015)Google Scholar
  19. 19.
    Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 3319–3328 (2017)Google Scholar
  20. 20.
    Thiagarajan, J.J., Kailkhura, B., Sattigeri, P., Ramamurthy, K.N.: Treeview: peeking into deep neural networks via feature-space partitioning. In: NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems (2016)Google Scholar
  21. 21.
    Wagner, J., Köhler, J.M., Gindele, T., Hetzel, L., Wiedemer, J.T., Behnke, S.: Interpretable and fine-grained visual explanations for convolutional neural networks. In: Computer Vision and Pattern Recognition (2019)Google Scholar
  22. 22.
    Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Watanabe, C., Hiramatsu, K., Kashino, K.: Modular representation of autoencoder networks. In: Proceedings of 2017 IEEE Symposium on Deep Learning, 2017 IEEE Symposium Series on Computational Intelligence (2017)Google Scholar
  24. 24.
    Watanabe, C., Hiramatsu, K., Kashino, K.: Recursive extraction of modular structure from layered neural networks using variational Bayes method. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds.) DS 2017. LNCS (LNAI), vol. 10558, pp. 207–222. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67786-6_15CrossRefGoogle Scholar
  25. 25.
    Watanabe, C., Hiramatsu, K., Kashino, K.: Knowledge discovery from layered neural networks based on non-negative task decomposition. arXiv:1805.07137v2 (2018)
  26. 26.
    Watanabe, C., Hiramatsu, K., Kashino, K.: Modular representation of layered neural networks. Neural Netw. 97, 62–73 (2018)CrossRefGoogle Scholar
  27. 27.
    Watanabe, C., Hiramatsu, K., Kashino, K.: Understanding community structure in layered neural networks. arXiv:1804.04778 (2018)
  28. 28.
    Zahavy, T., Ben-Zrihem, N., Mannor, S.: Graying the black box: understanding DQNs. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1899–1908 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.NTT Communication Science LaboratoriesAtsugi-shiJapan

Personalised recommendations