Statistically-Motivated Second-Order Pooling

  • Kaicheng YuEmail author
  • Mathieu Salzmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11211)


Second-order pooling, a.k.a. bilinear pooling, has proven effective for deep learning based visual recognition. However, the resulting second-order networks yield a final representation that is orders of magnitude larger than that of standard, first-order ones, making them memory-intensive and cumbersome to deploy. Here, we introduce a general, parametric compression strategy that can produce more compact representations than existing compression techniques, yet outperform both compressed and uncompressed second-order models. Our approach is motivated by a statistical analysis of the network’s activations, relying on operations that lead to a Gaussian-distributed final representation, as inherently used by first-order deep networks. As evidenced by our experiments, this lets us outperform the state-of-the-art first-order and second-order models on several benchmark recognition datasets.


Second-order descriptors Convolutional neural networks Image classification 

Supplementary material

474212_1_En_37_MOESM1_ESM.pdf (199 kb)
Supplementary material 1 (pdf 199 KB)


  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from
  2. 2.
    Arandjelovic, R., Zisserman, A.: All about VLAD. In: CVPR, pp. 1578–1585 (2013)Google Scholar
  3. 3.
    Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med. 56, 411–421 (2006)CrossRefGoogle Scholar
  4. 4.
    Bartlett, M.S., Kendall, D.G.: The statistical analysis of variance-heterogeneity and the logarithmic transformation. Suppl. J. R. Stat. Soc. 8(1), 128–138 (1946). Scholar
  5. 5.
    Bell, S., Upchurch, P., Snavely, N., Bala, K.: Material recognition in the wild with the materials in context database. In: CVPR (2015)Google Scholar
  6. 6.
    Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). Scholar
  7. 7.
    Cherian, A., Sra, S.: Riemannian sparse coding for positive definite matrices. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 299–314. Springer, Cham (2014). Scholar
  8. 8.
    Chollet, F., et al.: Keras (2015).
  9. 9.
    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)Google Scholar
  10. 10.
    Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: CVPR (2017)Google Scholar
  11. 11.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  12. 12.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Suppl. J. R. Stat. Soc. 55, 119–139 (1997)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR, pp. 317–326 (2016)Google Scholar
  14. 14.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar
  15. 15.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016).
  16. 16.
    Guo, K., Ishwar, P., Konrad, J.: Action recognition using sparse representation on covariance manifolds of optical flow. In: IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2010)Google Scholar
  17. 17.
    Harandi, M.T., Salzmann, M., Hartley, R.: From manifold to manifold: geometry-aware dimensionality reduction for SPD matrices. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 17–32. Springer, Cham (2014). Scholar
  18. 18.
    Harandi, M., Salzmann, M.: Riemannian coding and dictionary learning: Kernels to the rescue. In: CVPR (2015)Google Scholar
  19. 19.
    Harandi, M.T., Sanderson, C., Hartley, R., Lovell, B.C.: Sparse coding and dictionary learning for symmetric positive definite matrices: a kernel approach. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 216–229. Springer, Heidelberg (2012). Scholar
  20. 20.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  21. 21.
    Huang, C.H., Boyer, E., Angonese, B.D.C., Navab, N., Ilic, S.: Toward user-specific tracking by detection of human shapes in multi-cameras. In: CVPR (2015)Google Scholar
  22. 22.
    Huang, G., Liu, Z., Weinberger, K., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  23. 23.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  24. 24.
    Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers (2015)Google Scholar
  25. 25.
    James, A.T.: The non-central Wishart distribution. Proc. R. Soc. London. Ser. A Math. Phys. Sci. 229(1178), 364–366 (1955). Scholar
  26. 26.
    Johnson, R.A., Wichern, D.W., et al.: Applied Multivariate Statistical Analysis, vol. 4. Prentice-Hall, Englewood Cliffs (2014)zbMATHGoogle Scholar
  27. 27.
    Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: CVPR (2017)Google Scholar
  28. 28.
    Koniusz, P., Tas, Y., Porikli, F.: Domain adaptation by mixture of alignments of second- or higher-order scatter tensors. In: CVPR (2017)Google Scholar
  29. 29.
    Koniusz, P., Zhang, H., Porikli, F.: A deeper look at power normalizations. In: CVPR, pp. 5774–5783 (2018)Google Scholar
  30. 30.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  31. 31.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  32. 32.
    Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV (2017)Google Scholar
  33. 33.
    Li, P., Wang, Q., Zuo, W., Zhang, L.: Log-Euclidean kernels for sparse representation and dictionary learning. In: ICCV (2013)Google Scholar
  34. 34.
    Lin, T.Y., Maji, S.: Improved bilinear pooling with CNNs. In: BMVC (2017)Google Scholar
  35. 35.
    Lin, T.Y., Maji, S., Koniusz, P.: Second-order democratic aggregation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part III. LNCS, vol. 11207, pp. 639–656. Springer, Cham (2018)Google Scholar
  36. 36.
    Lin, T., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)Google Scholar
  37. 37.
    Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. IJCV 66, 41–66 (2006)CrossRefGoogle Scholar
  38. 38.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). Scholar
  39. 39.
    Quang, M.H., San-Biagio, M., Murino, V.: Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces. In: NIPS (2014)Google Scholar
  40. 40.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR, pp. 413–420 (2009)Google Scholar
  41. 41.
    Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: ICPR (2012)Google Scholar
  42. 42.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  43. 43.
    Sra, S.: A new metric on the manifold of kernel matrices with application to matrix geometric means. In: NIPS (2012)Google Scholar
  44. 44.
    Sra, S., Cherian, A.: Generalized dictionary learning for symmetric positive definite matrices with application to nearest neighbor retrieval. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 318–332. Springer, Heidelberg (2011). Scholar
  45. 45.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9, June 2015Google Scholar
  46. 46.
    Tuzel, O., Porikli, F., Meer, P.: Human detection via classification on Riemannian manifolds. In: CVPR, pp. 1–8 (2007)Google Scholar
  47. 47.
    Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, New York (1998)Google Scholar
  48. 48.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical report (2011)Google Scholar
  49. 49.
    Wang, Q., Li, P., Zuo, W., Zhang, L.: RAID-G - robust estimation of approximate infinite dimensional Gaussian with application to material recognition. In: CVPR (2016)Google Scholar
  50. 50.
    Wilson, E.B., Hilferty, M.M.: The distribution of chi-square. Proc. Natl. Acad. Sci. 17(12), 684–688 (1931)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.CVLabEPFLLausanneSwitzerland

Personalised recommendations