Advertisement

Cluster Computing

, Volume 22, Supplement 4, pp 9371–9383 | Cite as

Weighted pooling for image recognition of deep convolutional neural networks

  • Xiaoning ZhuEmail author
  • Qingyue Meng
  • Bojian Ding
  • Lize Gu
  • Yixian Yang
Article

Abstract

There are some traditional pooling methods in convolutional neural network, such as max-pooling, average pooling, stochastic pooling and so on, which determine the results of pooling based on the distribution of each activation in the pooling region. Zeiler and Fergus (Stochastic-pooling for regularization of deep convolutional neural networks, 2013) However, it is difficult for the feature mapping process to select a perfect activation representative of the pooling region, and can lead to the phenomenon of over-fitting. In this paper, the following theoretical basis comes out information theory (Shannon in Bell Syst. Tech. J. 27:379–423, 1948). First, we quantify the information entropy of each pooling region, and then propose an efficient pooling method by comparing the mutual information between activations and the pooling region which they are located in. Moreover, we assign different weights to different activations based on mutual information, and named it weighted-pooling. The main features of the weighted-pooling method are as follows: (1) The information quantity of the pooling region is quantified by information theory for the first time. (2) Also, each activation’s contribution was quantified for the first time and these contributions eliminate the uncertainty of the pooling region which it is located in. (3) For choosing a representative in this pooling region, the weight of each activation obviously superiors to the value of activation. In the experimental part, we respectively use MNIST and CIFAR-10 (Krizhevsky in Learning multiple layers of featurs from tiny images, University of Toronto, 2009; LeCun in The MNIST database, 2012) data sets to compare different pooling methods. The results show that the weighted-pooling method has higher recognition accuracy than other pooling methods and reaches a new state-of-the-art.

Keywords

Pooling method Convolutional neural network Image recognition Information theory 

Notes

Acknowledgements

The authors would like to thank the reviewers for their helpful advices. The National Science and Technology Major Project (Grant No. 2017YFB0803001), the National Natural Science Foundation of China (Grant No. 61502048), Beijing Science and Technology Planning Project (Grant No. Z161100000216145) and the National “242” Information Security Program (2015A136) are gratefully acknowledged.

References

  1. 1.
    Zeiler, M. D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. Eprint Arxiv (2013)Google Scholar
  2. 2.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(4), 379–423 (1948)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Krizhevsky, A.: Learning multiple layers of featurs from tiny images. Technical Report TR-2009, University of Toronto (2009)Google Scholar
  4. 4.
    LeCun, Y.: The MNIST database. http://yann.lecun.com/exdb/mnist/ (2012)
  5. 5.
    Ba, J. L., Kiros, J. R., Hinton, G. E. Layer normalization (2016)Google Scholar
  6. 6.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jacke, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  7. 7.
    LeCun, Y., Boser, B., Denker, J. S., Howard, R. E., Habbard, W., Jackel, L. D., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan Kaufmann Publishers Inc., San Francisco (1990)Google Scholar
  8. 8.
    Krizhevsky, A., Sutskever, I., Hinton, G. E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  9. 9.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014 )Google Scholar
  10. 10.
    Simonyan, K. Zisserman, A.: Two-stream convolutional networks for action recognition in videos. CoRR, abs/1406.2199, 2014. Published in Proceeding NIPS (2014)Google Scholar
  11. 11.
    Szegedy, C., Liu, W., Jia, Y. et al.: Going deeper with convolutions. pp. 1–9 (2014)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S. et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition. pp. 770–778, IEEE (2016)Google Scholar
  13. 13.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Anal. 99, 1 (2017)Google Scholar
  14. 14.
    Zhang, B., Li, Z., Cao, X., Ye, Q., Chen, C., Shen, L., Perina, A., Ji, R.: Output constraint transfer for kernelized correlation filter in tracking. IEEE Trans. Syst. Man Cybernet. 47(4), 693–703 (2017)CrossRefGoogle Scholar
  15. 15.
    Wang, L., Zhang, B., Yang, W.. Boosting-like deep convolutional network for pedestrian detection. In: Biometric Recognition. Springer International Publishing (2015)Google Scholar
  16. 16.
    Zhang, B., Gu, J., Chen, C., Han, J., Su, X., Cao, X., Liu, J.: One-two-one network for compression artifacts reduction in remote sensing, In: ISPRS Journal of Photogrammetry and Remote Sensing (2018)Google Scholar
  17. 17.
    Zhang, B., Liu, W., Mao, Z., et al.: Cooperative and geometric learning algorithm (CGLA) for path planning of UAVs with limited information. Automatica 50(3), 809–820 (2014)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2014)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Abadi, M., Agarwal, A., Barham, P. et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2016)Google Scholar
  20. 20.
    Kingma, D. P., Adam, J. B: A method for stochastic optimization. Comput. Sci. (2014)Google Scholar
  21. 21.
    Zeiler, M. D., Fergus, R.: Visualizing and understanding convolutional networks. 8689, pp. 818–833 (2014)Google Scholar
  22. 22.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–66 (1994)CrossRefGoogle Scholar
  23. 23.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010)Google Scholar
  24. 24.
    He, K., Zhang ,X., Ren, S. et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, pp. 1026–1034 (2015)Google Scholar
  25. 25.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on International Conference on Machine Learning. JMLR.org, pp. 448–456 (2015)Google Scholar
  26. 26.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 257–269 (2011)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Zeiler, M. D.: ADADELTA: an adaptive learning rate method. In: Computer Science (2012)Google Scholar
  28. 28.
    Boureau, Y. L., Ponce, J., Lecun, Y.: A theoretical analysis of feature pooling in visual recognition. In: International Conference on Machine Learning. DBLP, pp. 111–118 (2010)Google Scholar
  29. 29.
    Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106 (1962)CrossRefGoogle Scholar
  30. 30.
    Koenderink, J.J., Van Doorn, A.J.: The structure of locally orderless images. Int. J. Comput. Vis. 31(2–3), 159–168 (1999)CrossRefGoogle Scholar
  31. 31.
    Graham, B.: Fractional max-pooling. Eprint Arxiv (2014)Google Scholar
  32. 32.
    Harada, T., Ushiku, Y., Yamashita, Y. et al.: Discriminative spatial pyramid. In: Computer Vision and Pattern Recognition. IEEE, pp. 1617–1624 (2011)Google Scholar
  33. 33.
    He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  34. 34.
    Fan, E.G.: Extended tanh-function method and its applications to nonlinear equations. Phys. Lett.s A 277(4), 212–218 (2000)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., et al.: Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 3(4), 212–223 (2012)Google Scholar
  36. 36.
    Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge, VOC 2007 Results (2007)Google Scholar
  37. 37.
    Ren, S., Girshick, R., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Cyberspace SecurityBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Guizhou Provincial Key Laboratory of Public Big DataGuizhou UniversityGuiyangChina

Personalised recommendations