Skip to main content

Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13671))

Included in the following conference series:

Abstract

Existing Binary Neural Networks (BNNs) mainly operate on local convolutions with binarization function. However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. First, we propose a binary multi-layer perceptron (MLP) block as an alternative to binary convolution blocks to directly model contextual dependencies. Both short-range and long-range feature dependencies are modeled by binary MLPs, where the former provides local inductive bias and the latter breaks limited receptive field in binary convolutions. Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks. Armed with our binary MLP blocks and improved binary convolution, we build the BNNs with explicit Contextual Dependency modeling, termed as BCDNet. On the standard ImageNet-1K classification benchmark, the BCDNet achieves 72.3% Top-1 accuracy and outperforms leading binary methods by a large margin. In particular, the proposed BCDNet exceeds the state-of-the-art ReActNet-A by 2.9% Top-1 accuracy with similar operations. Our code is available at https://github.com/Sense-GVT/BCDNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bethge, J., Bartz, C., Yang, H., Chen, Y., Meinel, C.: MeliusNet: can binary neural networks achieve mobileNet-level accuracy? arXiv preprint arXiv:2001.05936 (2020)

  2. Chen, S., Xie, E., Ge, C., Liang, D., Luo, P.: CycleMLP: a MLP-like architecture for dense prediction. arXiv preprint arXiv:2107.10224 (2021)

  3. Chu, X., et al.: Twins: revisiting spatial attention design in vision transformers (2021)

    Google Scholar 

  4. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)

  5. Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1601–1610 (2021)

    Google Scholar 

  6. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  9. Ji, M., Shin, S., Hwang, S., Park, G., Moon, I.C.: Refine myself by teaching myself: feature refinement via self-knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10664–10673 (2021)

    Google Scholar 

  10. Jiang, X., Wang, N., Xin, J., Li, K., Yang, X., Gao, X.: Training binary neural network without batch normalization for image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1700–1707 (2021)

    Google Scholar 

  11. Joo, D., Yi, E., Baek, S., Kim, J.: Linearly replaceable filters for deep network channel pruning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8021–8029 (2021)

    Google Scholar 

  12. Khosla, A., Jayadevaprakash, N., Yao, B., Li, F.F.: Novel dataset for fine-grained image categorization: stanford dogs. In: Proceedings CVPR Workshop on Fine-Grained Visual Categorization (FGVC), vol. 2. Citeseer (2011)

    Google Scholar 

  13. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)

    Google Scholar 

  14. Lee, J., Kim, D., Ham, B.: Network quantization with element-wise gradient scaling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6448–6457 (2021)

    Google Scholar 

  15. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1954–1963 (2021)

    Google Scholar 

  16. Lin, M., et al.: Rotated binary neural network. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

  17. Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  18. Liu, C., Chen, P., Zhuang, B., Shen, C., Zhang, B., Ding, W.: SA-BNN: state-aware binary neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2091–2099 (2021)

    Google Scholar 

  19. Liu, X., Ye, M., Zhou, D., Liu, Q.: Post-training quantization with multiple points: mixed precision without mixed precision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8697–8705 (2021)

    Google Scholar 

  20. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)

  21. Liu, Z., Shen, Z., Li, S., Helwegen, K., Huang, D., Cheng, K.T.: How do Adam and training strategies help BNNs optimization? In: International Conference on Machine Learning. PMLR (2021)

    Google Scholar 

  22. Liu, Z., Shen, Z., Savvides, M., Cheng, K.-T.: ReActNet: towards precise binary neural network with generalized activation functions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 143–159. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_9

    Chapter  Google Scholar 

  23. Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-Real Net: enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. In: Proceedings of the European conference on computer vision (ECCV), pp. 722–737 (2018)

    Google Scholar 

  24. Li’an Zhuo, B.Z., Chen, H., Yang, L., Chen, C., Zhu, Y., Doermann, D.: CP-NAS: child-parent neural architecture search for 1-bit CNNs. In: IJCAI (2020)

    Google Scholar 

  25. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  26. Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

  27. Martinez, B., Yang, J., Bulat, A., Tzimiropoulos, G.: Training binary neural networks with real-to-binary convolutions. arXiv preprint arXiv:2003.11535 (2020)

  28. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)

    Google Scholar 

  29. Qin, H., et al.: Forward and backward information retention for accurate binary neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2250–2259 (2020)

    Google Scholar 

  30. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  31. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  32. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  33. Song, L., Wu, J., Yang, M., Zhang, Q., Li, Y., Yuan, J.: Robust knowledge transfer via hybrid forward on the teacher-student model. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2558–2566 (2021)

    Google Scholar 

  34. Tolstikhin, I., et al.: MLP-mixer: an all-MLP architecture for vision. arXiv preprint arXiv:2105.01601 (2021)

  35. Touvron, H., et al.: ResMLP: feedforward networks for image classification with data-efficient training. ArXiv abs/2105.03404 (2021)

    Google Scholar 

  36. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, vol. 139, pp. 10347–10357 (2021)

    Google Scholar 

  37. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)

    Google Scholar 

  38. Wu, B., et al.: Shift: a zero flop, zero parameter alternative to spatial convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9127–9135 (2018)

    Google Scholar 

  39. Xiao, T., Dollar, P., Singh, M., Mintun, E., Darrell, T., Girshick, R.: Early convolutions help transformers see better. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  40. Xu, Y., Han, K., Xu, C., Tang, Y., Xu, C., Wang, Y.: Learning frequency domain approximation for binary neural networks. Adv. Neural. Inf. Process. Syst. 34, 25553–25565 (2021)

    Google Scholar 

  41. Yamamoto, K.: Learnable companding quantization for accurate low-bit neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5029–5038 (2021)

    Google Scholar 

  42. Yang, Z., et al.: Searching for low-bit weights in quantized neural networks. Adv. Neural. Inf. Process. Syst. 33, 4091–4102 (2020)

    Google Scholar 

  43. Yu, T., Li, X., Cai, Y., Sun, M., Li, P.: S2-MLP: spatial-shift MLP architecture for vision. arXiv preprint arXiv:2106.07477 (2021)

  44. Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-Nets: learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp. 365–382 (2018)

    Google Scholar 

  45. Zhang, Z., Zhang, H., Zhao, L., Chen, T., Pfister, T.: Aggregating nested transformers. arXiv preprint arXiv:2105.12723 (2021)

  46. Zhao, K., et al.: Distribution adaptive int8 quantization for training CNNs. In: Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (2021)

    Google Scholar 

  47. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)

    Google Scholar 

  48. Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (U20B2042), the National Natural Science Foundation of China (62076019) and Aeronautical Science Fund (ASF) of China (2020Z071051001).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yalong Jiang or Yufeng Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 148 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xing, X. et al. (2022). Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20083-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20082-3

  • Online ISBN: 978-3-031-20083-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics