Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

  • Dominik Scherer
  • Andreas Müller
  • Sven Behnke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6354)


A common practice to gain invariant features in object recognition models is to aggregate multiple low-level features over a small neighborhood. However, the differences between those models makes a comparison of the properties of different aggregation functions hard. Our aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks. Empirical results show that a maximum pooling operation significantly outperforms subsampling operations. Despite their shift-invariant properties, overlapping pooling windows are no significant improvement over non-overlapping pooling windows. By applying this knowledge, we achieve state-of-the-art error rates of 4.57% on the NORB normalized-uniform dataset and 5.6% on the NORB jittered-cluttered dataset.


Recognition Rate Window Function Convolutional Neural Network Convolutional Layer Test Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahmed, A., Yu, K., Xu, W., Gong, Y., Xing, E.: Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 69–82. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Behnke, S.: Hierarchical Neural Networks for Image Interpretation. LNCS, vol. 2766. Springer, Heidelberg (2003)zbMATHGoogle Scholar
  3. 3.
    Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  4. 4.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106(1), 59–70 (2007)CrossRefGoogle Scholar
  5. 5.
    Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A., Adam, H., Neven, H., Vincent, L.: Large-scale Privacy Protection in Google Street View. EUA, California (2009)Google Scholar
  6. 6.
    Fukushima, K.: A neural network model for selective attention in visual pattern recognition. Biological Cybernetics 55(1), 5–15 (1986)zbMATHCrossRefGoogle Scholar
  7. 7.
    Huang, F.-J., LeCun, Y.: Large-scale learning with svm and convolutional nets for generic object categorization. In: Proc. Computer Vision and Pattern Recognition Conference (CVPR 2006). IEEE Press, Los Alamitos (2006)Google Scholar
  8. 8.
    Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology 148(3), 574 (1959)Google Scholar
  9. 9.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR, vol. (2), pp. 2169–2178. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  10. 10.
    LeCun, Y., Bottou, L., Orr, G., Müller, K.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    LeCun, Y., Huang, F., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: Proceedings of CVPR 2004. IEEE Press, Los Alamitos (2004)Google Scholar
  12. 12.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  13. 13.
    Müller, A., Schulz, H., Behnke, S.: Topological Features in Locally Connected RBMs. In: Proc. International Joint Conference on Neural Networks, IJCNN 2010 (2010)Google Scholar
  14. 14.
    Mutch, J., Lowe, D.G.: Multiclass Object Recognition with Sparse, Localized Features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition,vol. 1, pp. 11–18 (2006)Google Scholar
  15. 15.
    Nair, V., Hinton, G.: 3-d object recognition with deep belief nets. In: Advances in Neural Information Processing Systems (2010)Google Scholar
  16. 16.
    Nvidia Corporation. CUDA Programming Guide 3.0 (February 2010)Google Scholar
  17. 17.
    Osadchy, M., LeCun, Y., Miller, M.: Synergistic Face Detection and Pose Estimation with Energy-Based Models. Journal of Machine Learning Research 8, 1197–1215 (2007)Google Scholar
  18. 18.
    Ranzato, M., Huang, F.-J., Boureau, Y.-L., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proc. Computer Vision and Pattern Recognition Conference (CVPR 2007). IEEE Press, Los Alamitos (2007)Google Scholar
  19. 19.
    Riedmiller, M., Braun, H.: RPROP – Description and Implementation Details. Technical report, University of Karlsruhe (January 1994)Google Scholar
  20. 20.
    Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)CrossRefGoogle Scholar
  21. 21.
    Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2 (2005)Google Scholar
  22. 22.
    Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2), 300 (2007)CrossRefGoogle Scholar
  23. 23.
    Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practice for Convolutional Neural Networks Applied to Visual Document Analysis. In: International Conference on Document Analysis and Recogntion (ICDAR), pp. 958–962. IEEE Computer Society, Los Alamitos (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Dominik Scherer
    • 1
  • Andreas Müller
    • 1
  • Sven Behnke
    • 1
  1. 1.Institute of Computer Science VI, Autonomous Intelligent Systems GroupUniversity of BonnBonnGermany

Personalised recommendations