Second-Order Democratic Aggregation

  • Tsung-Yu LinEmail author
  • Subhransu Maji
  • Piotr Koniusz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)


Aggregated second-order features extracted from deep convolutional networks have been shown to be effective for texture generation, fine-grained recognition, material classification, and scene understanding. In this paper, we study a class of orderless aggregation functions designed to minimize interference or equalize contributions in the context of second-order features and we show that they can be computed just as efficiently as their first-order counterparts and they have favorable properties over aggregation by summation. Another line of work has shown that matrix power normalization after aggregation can significantly improve the generalization of second-order representations. We show that matrix power normalization implicitly equalizes contributions during aggregation thus establishing a connection between matrix normalization techniques and prior work on minimizing interference. Based on the analysis we present \(\gamma \)-democratic aggregators that interpolate between sum (\(\gamma \) = 1) and democratic pooling (\(\gamma \) = 0) outperforming both on several classification tasks. Moreover, unlike power normalization, the \(\gamma \)-democratic aggregations can be computed in a low dimensional space by sketching that allows the use of very high-dimensional second-order features. This results in a state-of-the-art performance on several datasets.


Second-order features Democratic pooling Matrix power normalization Tensor sketching 



We acknowledge support from NSF (#1617917, #1749833) and the MassTech Collaborative grant for funding the UMass GPU cluster.

Supplementary material

474178_1_En_38_MOESM1_ESM.pdf (184 kb)
Supplementary material 1 (pdf 184 KB)


  1. 1.
    Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)Google Scholar
  2. 2.
    Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med. 56(2), 411–421 (2006)CrossRefGoogle Scholar
  3. 3.
    Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)zbMATHGoogle Scholar
  4. 4.
    Bhatia, R., Davis, C.: A better bound on the variance. Am. Math. Mon. 107(4), 353–357 (2000)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Boughorbel, S., Tarel, J.P., Boujemaa, N.: Generalized histogram intersection kernel for image recognition. In: ICIP (2005)Google Scholar
  6. 6.
    Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012 Part VII. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012). Scholar
  7. 7.
    Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.: Jensen-Bregman LogDet divergence with application to efficient similarity search for covariance matrices. TPAMI 35(9), 2161–2174 (2013)CrossRefGoogle Scholar
  8. 8.
    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)Google Scholar
  9. 9.
    Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition and segmentation. In: CVPR (2015)Google Scholar
  10. 10.
    Dai, X., Yue-Hei Ng, J., Davis, L.S.: FASON: first and second order information fusion network for texture recognition. In: CVPR (2017)Google Scholar
  11. 11.
    Dryden, I.L., Koloydenko, A., Zhou, D.: Non-euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann. Appl. Stat. 3(3), 1102–1123 (2009)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR (2016)Google Scholar
  13. 13.
    Genevay, A., Peyré, G., Cuturi, M.: Learning generative models with sinkhorn divergences (2017). arXiv preprint arXiv:1706.00292
  14. 14.
    Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: ECCV (2014)Google Scholar
  15. 15.
    Guo, K., Ishwar, P., Konrad, J.: Action recognition from video using feature covariance matrices. Trans. Image Procss. 22(6), 2479–2494 (2013)MathSciNetCrossRefGoogle Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  17. 17.
    Huang, Z., Gool, L.V.: A Riemannian network for SPD matrix learning. In: AAAI (2017)Google Scholar
  18. 18.
    Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix Backpropagation for deep networks with structured layers. In: ICCV (2015)Google Scholar
  19. 19.
    Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR (2009)Google Scholar
  20. 20.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)Google Scholar
  21. 21.
    Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: CVPR (2014)Google Scholar
  22. 22.
    Khan, S.H., Hayat, M., Porikli, F.: Scene categorization with spectral features. In: ICCV (2017)Google Scholar
  23. 23.
    Knight, P.A.: The Sinkhorn-Knopp algorithm: convergence and applications. SIAM J. Matrix Anal. Appl. 30(1), 261–275 (2008)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Koniusz, P., Yan, F., Gosselin, P., Mikolajczyk, K.: Higher-order occurrence pooling on mid- and low-level features: visual concept detection. Technical report, HAL Id: hal-00922524 (2013)Google Scholar
  25. 25.
    Koniusz, P., Yan, F., Gosselin, P., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. PAMI 39(2), 313–326 (2017)CrossRefGoogle Scholar
  26. 26.
    Koniusz, P., Cherian, A., Porikli, F.: Tensor representations via kernel linearization for action recognition from 3D skeletons. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016 Part IV. LNCS, vol. 9908, pp. 37–53. Springer, Cham (2016). Scholar
  27. 27.
    Koniusz, P., Tas, Y., Zhang, H., Harandi, M., Porikli, F., Zhang, R.: Museum exhibit identification challenge for the supervised domain adaptation. In: ECCV (2018)Google Scholar
  28. 28.
    Koniusz, P., Zhang, H., Porikli, F.: A deeper look at power normalizations. In: CVPR, pp. 5774–5783 (2018)Google Scholar
  29. 29.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Workshop on 3D Representation and Recognition (3DRR) (2013)Google Scholar
  30. 30.
    Li, P., Wang, Q.: Local log-euclidean covariance matrix (L2ECM) for image representation and its applications. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012 Part III. LNCS, vol. 7574, pp. 469–482. Springer, Heidelberg (2012). Scholar
  31. 31.
    Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV (2017)Google Scholar
  32. 32.
    Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear convolutional neural networks for fine-grained visual recognition. IEEE TPAMI 40(6), 1309–1322 (2018)CrossRefGoogle Scholar
  33. 33.
    Lin, T.Y., Maji, S.: Improved bilinear pooling with CNNs. In: BMVC (2017)Google Scholar
  34. 34.
    Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV (2015)Google Scholar
  35. 35.
    Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft (2013)Google Scholar
  36. 36.
    Mena, G., Belanger, D., Linderman, S., Snoek, J.: Learning latent permutations with Gumbel-Sinkhorn networks (2018). arXiv preprint arXiv:1802.08665
  37. 37.
    Murray, N., Jégou, H., Perronnin, F., Zisserman, A.: Interferences in match Kernels. IEEE TPAMI 39(9), 1797–1810 (2017)CrossRefGoogle Scholar
  38. 38.
    Negrel, R., Picard, D., Gosselin, P.H.: Compact tensor based image representation for similarity search. In: ICIP (2012)Google Scholar
  39. 39.
    Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. IJCV 66(1), 41–66 (2006)CrossRefGoogle Scholar
  40. 40.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)Google Scholar
  41. 41.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010 Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). Scholar
  42. 42.
    Pham, N., Pagh, R.: Fast and scalable polynomial kernels via explicit feature maps. In: KDD (2013)Google Scholar
  43. 43.
    Popoviciu, T.: Sur les équations algébriques ayant toutes leurs racines réelles. Mathematica 9, 129–145 (1935)zbMATHGoogle Scholar
  44. 44.
    Porikli, F., Tuzel, O.: Covariance tracker. In: CVPR (2006)Google Scholar
  45. 45.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)Google Scholar
  46. 46.
    Romero, A., Terán, M.Y., Gouiffès, M., Lacassagne, L.: Enhanced local binary covariance matrices for texture analysis and object tracking. In: MIRAGE (2013)Google Scholar
  47. 47.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. IJCV 105(3), 222–245 (2013)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Sharan, L., Rosenholtz, R., Adelson, E.: Material perceprion: what can you see in a brief glance? J. Vis. 9(8), 784 (2009)CrossRefGoogle Scholar
  49. 49.
    Shih, Y.F., Yeh, Y.M., Lin, Y.Y., Weng, M.F., Lu, Y.C., Chuang, Y.Y.: Deep co-occurrence feature learning for visual object recognition. In: CVPR (2017)Google Scholar
  50. 50.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  51. 51.
    Song, Y., Zhang, F., Li, Q., Huang, H., O’Donnell, L.J., Cai, W.: Locally-transferred fisher vectors for texture classification. In: ICCV, October 2017Google Scholar
  52. 52.
    Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006 Part II. LNCS, vol. 3952, pp. 589–600. Springer, Heidelberg (2006). Scholar
  53. 53.
    Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on riemannian manifolds. IEEE TPAMI 30(10), 1713–1727 (2008)CrossRefGoogle Scholar
  54. 54.
    Wang, L., Guo, S., Huang, W., Qiao, Y.: Places205-VGGnet models for scene recognition. CoRR abs/1508.01667 (2015)Google Scholar
  55. 55.
    Wang, Z., Vemuri, B.C.: An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation. In: CVPR (2004)Google Scholar
  56. 56.
    Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report. CNS-TR-2010-001. California Institute of Technology (2010)Google Scholar
  57. 57.
    Yandex, A.B., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)Google Scholar
  58. 58.
    Yu, K., Salzmann, M.: Second-order convolutional neural networks. abs/1703.06817 (2017)Google Scholar
  59. 59.
    Yu, K., Salzmann, M.: Statistically-motivated second-order pooling. In: ECCV (2018)Google Scholar
  60. 60.
    Zhang, Y., Ozay, M., Liu, X., Okatani, T.: Integrating deep features for material recognition. In: ICPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.College of Information and Computer SciencesUniversity of Massachusetts AmherstAmherstUSA
  2. 2.Data61/CSIROAustralian National UniversityCanberraAustralia

Personalised recommendations