Advertisement

Dynamic Group Convolution for Accelerating Convolutional Neural Networks

Conference paper
  • 899 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

Replacing normal convolutions with group convolutions can significantly increase the computational efficiency of modern deep convolutional networks, which has been widely adopted in compact network architecture designs. However, existing group convolutions undermine the original network structures by cutting off some connections permanently resulting in significant accuracy degradation. In this paper, we propose dynamic group convolution (DGC) that adaptively selects which part of input channels to be connected within each group for individual samples on the fly. Specifically, we equip each group with a small feature selector to automatically select the most important input channels conditioned on the input images. Multiple groups can adaptively capture abundant and complementary visual/semantic features for each input image. The DGC preserves the original network structure and has similar computational efficiency as the conventional group convolution simultaneously. Extensive experiments on multiple image classification benchmarks including CIFAR-10, CIFAR-100 and ImageNet demonstrate its superiority over the existing group convolution techniques and dynamic execution methods. The code is available at https://github.com/zhuogege1943/dgc.

Keywords

Group convolution Dynamic execution Efficient network architecture 

Notes

Acknowledgement

This work was partially supported by the Academy of Finland under grant 331883 and the National Natural Science Foundation of China under Grant 61872379. The authors also wish to acknowledge CSC IT Center for Science, Finland, for computational resources.

Supplementary material

504443_1_En_9_MOESM1_ESM.pdf (606 kb)
Supplementary material 1 (pdf 606 KB)

References

  1. 1.
    Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Advances in Neural Information Processing Systems, pp. 7948–7956 (2019)Google Scholar
  2. 2.
    Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 527–536. JMLR.org (2017)Google Scholar
  3. 3.
    Cao, S., et al.: SeerNet: predicting convolutional neural network feature-map sparsity through low-bit quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11216–11225 (2019)Google Scholar
  4. 4.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)Google Scholar
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  6. 6.
    Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)Google Scholar
  7. 7.
    Ding, Y., Liu, J., Xiong, J., Shi, Y.: On the universal approximability and complexity bounds of quantized ReLU neural networks. arXiv preprint arXiv:1802.03646 (2018)
  8. 8.
    Dong, X., Huang, J., Yang, Y., Yan, S.: More is less: a more complicated network with less inference complexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5840–5848 (2017)Google Scholar
  9. 9.
    Gao, X., Zhao, Y., Dudziak, Ł., Mullins, R., Xu, C.Z.: Dynamic channel pruning: feature boosting and suppression. arXiv preprint arXiv:1810.05331 (2018)
  10. 10.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866 (2018)
  13. 13.
    He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2019)Google Scholar
  14. 14.
    Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1921–1930 (2019)Google Scholar
  15. 15.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  16. 16.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)Google Scholar
  17. 17.
    Hua, W., Zhou, Y., De Sa, C.M., Zhang, Z., Suh, G.E.: Channel gating neural networks. In: Advances in Neural Information Processing Systems, pp. 1884–1894 (2019)Google Scholar
  18. 18.
    Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. arXiv preprint arXiv:1703.09844 (2017)
  19. 19.
    Huang, G., Liu, S., Van der Maaten, L., Weinberger, K.Q.: CondenseNet: an efficient DenseNet using learned group convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2752–2761 (2018)Google Scholar
  20. 20.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)Google Scholar
  21. 21.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  22. 22.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
  23. 23.
    Jin, X., et al.: Knowledge distillation via route constrained optimization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1345–1354 (2019)Google Scholar
  24. 24.
    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)Google Scholar
  25. 25.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  26. 26.
    Lee, N., Ajanthan, T., Torr, P.H.: SNIP: single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340 (2018)
  27. 27.
    Li, Y., et al.: Exploiting kernel sparsity and entropy for interpretable CNN compression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2809 (2019)Google Scholar
  28. 28.
    Liang, X.: Learning personalized modular network guided by structured knowledge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8944–8952 (2019)Google Scholar
  29. 29.
    Lin, J., Rao, Y., Lu, J., Zhou, J.: Runtime neural pruning. In: Advances in Neural Information Processing Systems, pp. 2181–2191 (2017)Google Scholar
  30. 30.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  31. 31.
    Liu, C., et al.: Circulant binary convolutional networks: enhancing the performance of 1-bit DCNNs with circulant back propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2691–2699 (2019)Google Scholar
  32. 32.
    Liu, J., et al.: Knowledge representing: efficient, sparse representation of prior knowledge for knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  33. 33.
    Liu, L., Deng, J.: Dynamic deep neural networks: optimizing accuracy-efficiency trade-offs by selective execution. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  34. 34.
    Liu, Y., Dong, W., Zhang, L., Gong, D., Shi, Q.: Variational Bayesian dropout with a hierarchical prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7124–7133 (2019)Google Scholar
  35. 35.
    Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)Google Scholar
  36. 36.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  37. 37.
    Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01264-9_8CrossRefGoogle Scholar
  38. 38.
    Minnehan, B., Savakis, A.: Cascaded projection: end-to-end network compression and acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10715–10724 (2019)Google Scholar
  39. 39.
    Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)Google Scholar
  40. 40.
    Peng, H., Wu, J., Chen, S., Huang, J.: Collaborative channel pruning for deep networks. In: International Conference on Machine Learning, pp. 5113–5122 (2019)Google Scholar
  41. 41.
    Phuong, M., Lampert, C.: Towards understanding knowledge distillation. In: International Conference on Machine Learning, pp. 5142–5151 (2019)Google Scholar
  42. 42.
    Phuong, M., Lampert, C.H.: Distillation-based training for multi-exit architectures. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1355–1364 (2019)Google Scholar
  43. 43.
    Sakr, C., Shanbhag, N.: Per-tensor fixed-point quantization of the back-propagation algorithm. arXiv preprint arXiv:1812.11732 (2018)
  44. 44.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)Google Scholar
  45. 45.
    Sun, K., Li, M., Liu, D., Wang, J.: IGCV3: interleaved low-rank group convolutions for efficient deep neural networks. In: BMVC (2018)Google Scholar
  46. 46.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar
  47. 47.
    Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 3–18. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_1CrossRefGoogle Scholar
  48. 48.
    Wang, X., Kan, M., Shan, S., Chen, X.: Fully learnable group convolution for acceleration of deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9049–9058 (2019)Google Scholar
  49. 49.
    Wang, X., Yu, F., Dou, Z.-Y., Darrell, T., Gonzalez, J.E.: SkipNet: learning dynamic routing in convolutional networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 420–436. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01261-8_25CrossRefGoogle Scholar
  50. 50.
    Wu, Z., et al.: BlockDrop: dynamic inference paths in residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8817–8826 (2018)Google Scholar
  51. 51.
    Xiao, X., Wang, Z., Rajasekaran, S.: AutoPrune: automatic network pruning by regularizing auxiliary parameters. In: Advances in Neural Information Processing Systems, pp. 13681–13691 (2019)Google Scholar
  52. 52.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)Google Scholar
  53. 53.
    Yoo, J., Cho, M., Kim, T., Kang, U.: Knowledge extraction with no observable data. In: Advances in Neural Information Processing Systems, pp. 2701–2710 (2019)Google Scholar
  54. 54.
    Zhang, T., Qi, G.J., Xiao, B., Wang, J.: Interleaved group convolutions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4373–4382 (2017)Google Scholar
  55. 55.
    Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)Google Scholar
  56. 56.
    Zhang, X., Zou, J., Ming, X., He, K., Sun, J.: Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp. 1984–1992 (2015)Google Scholar
  57. 57.
    Zhuang, Z., et al.: Discrimination-aware channel pruning for deep neural networks. In: Advances in Neural Information Processing Systems, pp. 875–886 (2018)Google Scholar
  58. 58.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Center for Machine Vision and Signal AnalysisUniversity of OuluOuluFinland
  2. 2.South China University of TechnologyGuangzhouChina
  3. 3.National University of Defense TechnologyChangshaChina

Personalised recommendations