Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

Scherer, Dominik; Müller, Andreas; Behnke, Sven

doi:10.1007/978-3-642-15825-4_10

Dominik Scherer¹⁹,
Andreas Müller¹⁹ &
Sven Behnke¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6354))

Included in the following conference series:

International Conference on Artificial Neural Networks

4550 Accesses
524 Citations
10 Altmetric

Abstract

A common practice to gain invariant features in object recognition models is to aggregate multiple low-level features over a small neighborhood. However, the differences between those models makes a comparison of the properties of different aggregation functions hard. Our aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks. Empirical results show that a maximum pooling operation significantly outperforms subsampling operations. Despite their shift-invariant properties, overlapping pooling windows are no significant improvement over non-overlapping pooling windows. By applying this knowledge, we achieve state-of-the-art error rates of 4.57% on the NORB normalized-uniform dataset and 5.6% on the NORB jittered-cluttered dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmed, A., Yu, K., Xu, W., Gong, Y., Xing, E.: Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 69–82. Springer, Heidelberg (2008)
Chapter Google Scholar
Behnke, S.: Hierarchical Neural Networks for Image Interpretation. LNCS, vol. 2766. Springer, Heidelberg (2003)
MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: CVPR, pp. 886–893 (2005)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106(1), 59–70 (2007)
Article Google Scholar
Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A., Adam, H., Neven, H., Vincent, L.: Large-scale Privacy Protection in Google Street View. EUA, California (2009)
Google Scholar
Fukushima, K.: A neural network model for selective attention in visual pattern recognition. Biological Cybernetics 55(1), 5–15 (1986)
Article MATH Google Scholar
Huang, F.-J., LeCun, Y.: Large-scale learning with svm and convolutional nets for generic object categorization. In: Proc. Computer Vision and Pattern Recognition Conference (CVPR 2006). IEEE Press, Los Alamitos (2006)
Google Scholar
Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology 148(3), 574 (1959)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR, vol. (2), pp. 2169–2178. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
LeCun, Y., Bottou, L., Orr, G., Müller, K.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)
Chapter Google Scholar
LeCun, Y., Huang, F., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: Proceedings of CVPR 2004. IEEE Press, Los Alamitos (2004)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)
Article Google Scholar
Müller, A., Schulz, H., Behnke, S.: Topological Features in Locally Connected RBMs. In: Proc. International Joint Conference on Neural Networks, IJCNN 2010 (2010)
Google Scholar
Mutch, J., Lowe, D.G.: Multiclass Object Recognition with Sparse, Localized Features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition,vol. 1, pp. 11–18 (2006)
Google Scholar
Nair, V., Hinton, G.: 3-d object recognition with deep belief nets. In: Advances in Neural Information Processing Systems (2010)
Google Scholar
Nvidia Corporation. CUDA Programming Guide 3.0 (February 2010)
Google Scholar
Osadchy, M., LeCun, Y., Miller, M.: Synergistic Face Detection and Pose Estimation with Energy-Based Models. Journal of Machine Learning Research 8, 1197–1215 (2007)
Google Scholar
Ranzato, M., Huang, F.-J., Boureau, Y.-L., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proc. Computer Vision and Pattern Recognition Conference (CVPR 2007). IEEE Press, Los Alamitos (2007)
Google Scholar
Riedmiller, M., Braun, H.: RPROP – Description and Implementation Details. Technical report, University of Karlsruhe (January 1994)
Google Scholar
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)
Article Google Scholar
Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2 (2005)
Google Scholar
Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2), 300 (2007)
Article Google Scholar
Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practice for Convolutional Neural Networks Applied to Visual Document Analysis. In: International Conference on Document Analysis and Recogntion (ICDAR), pp. 958–962. IEEE Computer Society, Los Alamitos (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science VI, Autonomous Intelligent Systems Group, University of Bonn, Römerstr. 164, 53117, Bonn, Germany
Dominik Scherer, Andreas Müller & Sven Behnke

Authors

Dominik Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Müller
View author publications
You can also search for this author in PubMed Google Scholar
Sven Behnke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, TEI of Thessaloniki, 57400, Sindos, Greece
Konstantinos Diamantaras
Department of Informatics, Nicolaus Copernicus University, School of Physics, Astronomy, and Informatics, ul. Grudziadzka 5, 87-100, Torun, Poland
Wlodek Duch
Department of Forestry and Management of the Environment and Natural Resources, Democritus University of Thrace, Pantazidou 193, 68200, Orestiada Thrace, Greece
Lazaros S. Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scherer, D., Müller, A., Behnke, S. (2010). Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds) Artificial Neural Networks – ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15825-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-15825-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15824-7
Online ISBN: 978-3-642-15825-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics