Capsule Networks for Attention Under Occlusion

  • Antonio Rodríguez-SánchezEmail author
  • Tobias Dick
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11731)


Capsule Neural Networks (CapsNet) serve as an attempt to model the neural organization in biological neural networks. Through the routing-by-agreement algorithm, the attention mechanism is implemented as individual capsules that focus on specific upstream capsules while ignoring the rest. By using the routing algorithm, CapsNets are able to attend overlapping digits from the MNIST dataset. In this work, we evaluate the attention capabilities of Capsule Networks using the routing-by-agreement with occluded shape stimuli as presented in neurophysiology. We do so by implementing a more compact type of capsule network. Our results in classifying the non-occluded as well as the occluded shapes show that indeed, CapsNets are able to differentiate occlusions from near-occlusion situations as in real biological neurons. In our experiments, performing the reconstruction of the occluded stimuli also shows promising results.


Capsule Networks Overlapping datasets Deep learning 



We would like to thank Sebastian Stabinger for his useful comments and Prof. Anitha Pasupathy for providing the program to create the single shape stimuli.


  1. 1.
    Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the Cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)CrossRefGoogle Scholar
  2. 2.
    Pasupathy, A., Connor, C.E.: Shape representation in area V4: position-specific tuning for boundary conformation. J. Neurophysiol. 86(5), 2505–2519 (2001)CrossRefGoogle Scholar
  3. 3.
    Fukushima, K., Wake, N.: Handwritten alphanumeric character recognition by the neocognitron. IEEE Trans. Neural Netw. 2(3), 355–365 (1991)CrossRefGoogle Scholar
  4. 4.
    Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nat. Neurosci. 2(11), 1019 (1999)CrossRefGoogle Scholar
  5. 5.
    Rodríguez-Sánchez, A., Tsotsos, J.: The roles of endstopped and curvature tuned computations in a hierarchical representation of 2D shape. PLoS ONE 7(8), e42058 (2012)CrossRefGoogle Scholar
  6. 6.
    Rodríguez-Sánchez, A., Oberleiter, S., Xiong, H., Piater, J.: Learning V4 curvature cell populations from sparse endstopped cells. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 463–471. Springer, Cham (2016). Scholar
  7. 7.
    Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 3, 411–426 (2007)CrossRefGoogle Scholar
  8. 8.
    Rodríguez-Sánchez, A.J., Tsotsos, J.K.: The importance of intermediate representations for the modeling of 2D shape detection: endstopping and curvature tuned computations. In: CVPR 2011, June 2011, pp. 4321–4326 (2011)Google Scholar
  9. 9.
    LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  12. 12.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  13. 13.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  14. 14.
    Stabinger, S., Rodríguez-Sánchez, A.: Evaluation of deep learning on an abstract image classification dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2767–2772 (2017). Workshop on Mutual Benefits of Cognitive and Computer Vision (MBCC)Google Scholar
  15. 15.
    Kim, J., Ricci, M., Serre, T.: Not-so-CLEVR: visual relations strain feedforward neural networks (2018)Google Scholar
  16. 16.
    Stabinger, S., Rodríguez-Sánchez, A., Piater, J.: 25 years of CNNs: can we compare to human abstraction capabilities? In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 380–387. Springer, Cham (2016). Scholar
  17. 17.
    Kim, B., Reif, E., Wattenberg, M., Bengio, S.: Do neural networks show Gestalt phenomena? An exploration of the law of closure. arXiv preprint arXiv:1903.01069 (2019)
  18. 18.
    Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 44–51. Springer, Heidelberg (2011). Scholar
  19. 19.
    Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3859–3869 (2017)Google Scholar
  20. 20.
    LeCun, Y.: MNIST handwritten digit database. Accessed 05 Mar 2019
  21. 21.
    Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)
  22. 22.
    Bushnell, B.N., Harding, P.J., Kosai, Y., Pasupathy, A.: Partial occlusion modulates contour-based shape encoding in primate area V4. J. Neurosci. 31(11), 4012–4024 (2011)CrossRefGoogle Scholar
  23. 23.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, no. 1, p. 3 (2013)Google Scholar
  24. 24.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  25. 25.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX OSDI 2016, pp. 265–283 (2016)Google Scholar
  26. 26.
    Tensorflow Contributors: tf.train.exponential\(\_\)decay. Accessed 05 Mar 2019
  27. 27.
    Cadieu, C.F., et al.: Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comput. Biol. 10(12), e1003963 (2014)CrossRefGoogle Scholar
  28. 28.
    LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  29. 29.
    Crick, F.: The recent excitement about neural networks. Nature 337(6203), 129–132 (1989)CrossRefGoogle Scholar
  30. 30.
    Olshausen, B.A., Anderson, C.H., Van Essen, D.C.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993)CrossRefGoogle Scholar
  31. 31.
    Shahroudnejad, A., Afshar, P., Plataniotis, K.N., Mohammadi, A.: Improved explainability of capsule networks: relevance path by agreement. In: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, pp. 549–553 (2018)Google Scholar
  32. 32.
    Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4438–4446 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of InnsbruckInnsbruckAustria

Personalised recommendations