Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

  • Caglar Gulcehre
  • Kyunghyun Cho
  • Razvan Pascanu
  • Yoshua Bengio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8724)


In this paper we propose and investigate a novel nonlinear unit, called L p unit, for deep neural networks. The proposed L p unit receives signals from several projections of a subset of units in the layer below and computes a normalized L p norm. We notice two interesting interpretations of the L p unit. First, the proposed unit can be understood as a generalization of a number of conventional pooling operators such as average, root-mean-square and max pooling widely used in, for instance, convolutional neural networks (CNN), HMAX models and neocognitrons. Furthermore, the L p unit is, to a certain degree, similar to the recently proposed maxout unit [13] which achieved the state-of-the-art object recognition results on a number of benchmark datasets. Secondly, we provide a geometrical interpretation of the activation function based on which we argue that the L p unit is more efficient at representing complex, nonlinear separating boundaries. Each L p unit defines a superelliptic boundary, with its exact shape defined by the order p. We claim that this makes it possible to model arbitrarily shaped, curved boundaries more efficiently by combining a few L p units of different orders. This insight justifies the need for learning different orders for each unit in the model. We empirically evaluate the proposed L p units on a number of datasets and show that multilayer perceptrons (MLP) consisting of the L p units achieve the state-of-the-art results on a number of benchmark datasets. Furthermore, we evaluate the proposed L p unit on the recently proposed deep recurrent neural networks (RNN).


deep learning Lp unit multilayer perceptron 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop (2012)Google Scholar
  2. 2.
    Bayer, J., Osendorfer, C., Korhammer, D., Chen, N., Urban, S., van der Smagt, P.: On fast dropout and its applicability to recurrent networks. arXiv:1311.0701 (cs.NE) (2013)Google Scholar
  3. 3.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Machine Learning Res. 13, 281–305 (2012)zbMATHMathSciNetGoogle Scholar
  4. 4.
    Bergstra, J., Bengio, Y., Louradour, J.: Suitability of V1 energy models for object classification. Neural Computation 23(3), 774–790 (2011)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (2010)Google Scholar
  6. 6.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation (June 2010)Google Scholar
  7. 7.
    Boulanger-Lewandowski, N., Bengio, Y., Vincent, P.: Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In: ICML 2012 (2012)Google Scholar
  8. 8.
    Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: Proc. International Conference on Machine learning, ICML 2010 (2010)Google Scholar
  9. 9.
    Ciresan, D., Meier, U., Masci, J., Schmidhuber, J.: Multi column deep neural network for traffic sign classification. Neural Networks 32, 333–338 (2012)CrossRefGoogle Scholar
  10. 10.
    Fukushima, K.: Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36, 193–202 (1980)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS 2011 (2011)Google Scholar
  12. 12.
    Goodfellow, I.J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., Bengio, Y.: Pylearn2: a machine learning research library. arXiv preprint arXiv:1308.4214 (2013)Google Scholar
  13. 13.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML 2013 (2013)Google Scholar
  14. 14.
    Gulcehre, C., Bengio, Y.: Knowledge matters: Importance of prior information for optimization. In: International Conference on Learning Representations, ICLR 2013 (2013)Google Scholar
  15. 15.
    Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall (November 2008)Google Scholar
  16. 16.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580 (2012)Google Scholar
  17. 17.
    Hubel, D., Wiesel, T.: Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology (London) 195, 215–243 (1968)Google Scholar
  18. 18.
    Hyvärinen, A., Köster, U.: Complex cell pooling and the statistics of natural images. Network: Computation in Neural Systems 18(2), 81–100 (2007)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: Proc. International Conference on Computer Vision (ICCV 2009), pp. 2146–2153. IEEE (2009)Google Scholar
  20. 20.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, NIPS 2012 (2012)Google Scholar
  21. 21.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  22. 22.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Bottou, L., Littman, M. (eds.) Proceedings of the Twenty-seventh International Conference on Machine Learning (ICML 2010), pp. 807–814. ACM (2010)Google Scholar
  23. 23.
    Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. Technical report, arXiv:1301.3584 (2013)Google Scholar
  24. 24.
    Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. arXiv:1312.6026 (cs.NE) (December 2013)Google Scholar
  25. 25.
    Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: ICML 2013 (2013)Google Scholar
  26. 26.
    Ranzato, M., Mnih, V., Susskind, J.M., Hinton, G.E.: Modeling natural images using gated mrfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(9), 2206–2222 (2013)CrossRefGoogle Scholar
  27. 27.
    Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience (1999)Google Scholar
  28. 28.
    Rifai, S., Dauphin, Y., Vincent, P., Bengio, Y., Muller, X.: The manifold tangent classifier. In: NIPS 2011 (2011)Google Scholar
  29. 29.
    Rosenblatt, F.: Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Report (Cornell Aeronautical Laboratory). Spartan Books (1962)Google Scholar
  30. 30.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  31. 31.
    Susskind, J., Anderson, A., Hinton, G.E.: The Toronto face dataset. Technical Report UTML TR 2010-001, U. Toronto (2010)Google Scholar
  32. 32.
    Trebar, M., Steele, N.: Application of distributed svm architectures in classifying forest data cover types. Computers and Electronics in Agriculture 63(2), 119–130 (2008)CrossRefGoogle Scholar
  33. 33.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proc. Conference on Computer Vision and Pattern Recognition (CVPR 2010) (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Caglar Gulcehre
    • 1
  • Kyunghyun Cho
    • 1
  • Razvan Pascanu
    • 1
  • Yoshua Bengio
    • 1
  1. 1.Département d’Informatique et de Recherche OpérationelleUniversité de Montréal (⋆) CIFAR FellowCanada

Personalised recommendations