Advertisement

Toward Faster and Simpler Matrix Normalization via Rank-1 Update

Conference paper
  • 590 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12364)

Abstract

Bilinear pooling has been used in many computer vision tasks and recent studies discover that matrix normalization is a vital step for achieving impressive performance of bilinear pooling. The standard matrix normalization, however, needs singular value decomposition (SVD), which is not well suited in the GPU platform, limiting its efficiency in training and inference. To resolve this issue, the Newton-Schulz (NS) iteration method has been proposed to approximate the matrix square-root. Although it is GPU-friendly, the NS iteration still takes several (expensive) iterations of matrix-matrix multiplications. Furthermore, the NS iteration is incompatible with the compact bilinear features obtained from Tensor Sketch (TS) or Random Maclaurin (RM). To overcome those known limitations, in this paper we propose a “rank-1 update normalization” (RUN), which only needs matrix-vector multiplications and is hence substantially more efficient than the NS iteration using matrix-matrix multiplications. Moreover, RUN readily supports the normalization on compact bilinear features from TS or RM. Besides, RUN is simpler than the NS iteration and easier for implementation in practice. As RUN is a differentiable procedure, we can plug it in a CNN-based an end-to-end training setting. Extensive experiments on four public benchmarks demonstrates that, for the full bilinear pooling, RUN achieves comparable accuracy with a substantial speedup over the NS iteration. For the compact bilinear pooling, RUN achieves comparable accuracy with a significant speedup over SVD-based normalization.

Supplementary material

504475_1_En_13_MOESM1_ESM.pdf (171 kb)
Supplementary material 1 (pdf 171 KB)

References

  1. 1.
    Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. Siam J. Matrix Anal. Appl. 29(1), 328–347 (2006)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Burden, R.L., Faires, J.D.: Numerical Analysis, 4th edn. (1988)Google Scholar
  3. 3.
    Cherian, A., Koniusz, P., Gould, S.: Higher-order pooling of CNN features via kernel linearization for action recognition. In: Applications of Computer Vision (2017)Google Scholar
  4. 4.
    Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)Google Scholar
  5. 5.
    Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: CVPR. IEEE (2017)Google Scholar
  6. 6.
    Engin, M., Wang, L., Zhou, L., Liu, X.: DeepKSPD: learning kernel-matrix-based SPD representation for fine-grained image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 629–645. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01216-8_38CrossRefGoogle Scholar
  7. 7.
    Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: EMNLP (2016)Google Scholar
  8. 8.
    Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR. IEEE (2016)Google Scholar
  9. 9.
    Gou, M., Xiong, F., Camps, O., Sznaier, M.: MoNet: moments embedding network. In: CVPR. IEEE (2018)Google Scholar
  10. 10.
    Higham, N.J.: Functions of Matrices: Theory and Computation, vol. 104. Siam (2008)Google Scholar
  11. 11.
    Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: ICCV. IEEE (2015)Google Scholar
  12. 12.
    Kar, P., Karnick, H.: Random feature maps for dot product kernels. In: AISTATS (2012)Google Scholar
  13. 13.
    Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: CVPR, pp. 365–374. IEEE (2017)Google Scholar
  14. 14.
    Koniusz, P., Cherian, A., Porikli, F.: Tensor representations via kernel linearization for action recognition from 3D skeletons. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 37–53. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_3CrossRefGoogle Scholar
  15. 15.
    Koniusz, P., Yan, F., Gosselin, P.H., Mikolajczyk, K.: Higher-order occurrence pooling for bags-of-words: visual concept detection. T-PAMI 39(2), 313–326 (2017)CrossRefGoogle Scholar
  16. 16.
    Koniusz, P., Zhang, H., Porikli, F.: A deeper look at power normalizations. In: CVPR. IEEE (2018)Google Scholar
  17. 17.
    Lei, W., Zhang, J., Zhou, L., Chang, T., Li, W.: Beyond covariance: feature representation with nonlinear kernel matrices. In: ICCV. IEEE (2015)Google Scholar
  18. 18.
    Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: CVPR. IEEE (2018)Google Scholar
  19. 19.
    Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV. IEEE (2017)Google Scholar
  20. 20.
    Lin, T.Y., Maji, S.: Improved bilinear pooling with CNNs. In: BMVC (2017)Google Scholar
  21. 21.
    Lin, T.-Y., Maji, S., Koniusz, P.: Second-order democratic aggregation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 639–656. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_38CrossRefGoogle Scholar
  22. 22.
    Lin, T.Y., Roychowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV. IEEE (2015)Google Scholar
  23. 23.
    Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Technical report (2013)Google Scholar
  24. 24.
    Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)Google Scholar
  25. 25.
    Pham, N., Pagh, R.: Fast and scalable polynomial kernels via explicit feature maps. In: SIGKDD, pp. 239–247. ACM (2013)Google Scholar
  26. 26.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR. IEEE (2009)Google Scholar
  27. 27.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  28. 28.
    Tu, Z., et al.: Multi-stream CNN: learning representations based on human-related regions for action recognition. PR 79, 32–43 (2018)Google Scholar
  29. 29.
    Wang, Q., Li, P., Zhang, L.: G2DeNet: global gaussian distribution embedding network and its application to visual recognition. In: CVPR. IEEE (2017)Google Scholar
  30. 30.
    Wang, Q., Li, P., Zuo, W., Lei, Z.: Raid-g: Robust estimation of approximate infinite dimensional gaussian with application to material recognition. In: CVPR. IEEE (2016)Google Scholar
  31. 31.
    Wang, Y., Long, M., Wang, J., Yu, P.S.: Spatiotemporal pyramid network for video action recognition. In: CVPR. IEEE (2017)Google Scholar
  32. 32.
    Wei, X., Zhang, Y., Gong, Y., Zhang, J., Zheng, N.: Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 365–380. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_22CrossRefGoogle Scholar
  33. 33.
    Welinder, P., et al.: Caltech-UCSD birds 200 (2010)Google Scholar
  34. 34.
    Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 595–610. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01270-0_35CrossRefGoogle Scholar
  35. 35.
    Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3D object recognition. In: CVPR. IEEE (2018)Google Scholar
  36. 36.
    Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: ICCV. IEEE (2017)Google Scholar
  37. 37.
    Zheng, H., Fu, J., Zha, Z.J., Luo, J.: Learning deep bilinear transformation for fine-grained image representation. In: Advances in Neural Information Processing Systems, pp. 4277–4286. Curran Associates, Inc. (2019)Google Scholar
  38. 38.
    Zhou, L., Lei, W., Zhang, J., Shi, Y., Yang, G.: Revisiting metric learning for SPD matrix based visual representation. In: CVPR. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Cognitive Computing Lab, Baidu ResearchBellevueUSA
  2. 2.Cognitive Computing Lab, Baidu ResearchBeijingChina

Personalised recommendations