Skip to main content

Sharing Weights in Shallow Layers via Rotation Group Equivariant Convolutions

Abstract

The convolution operation possesses the characteristic of translation group equivariance. To achieve more group equivariances, rotation group equivariant convolutions (RGEC) are proposed to acquire both translation and rotation group equivariances. However, previous work paid more attention to the number of parameters and usually ignored other resource costs. In this paper, we construct our networks without introducing extra resource costs. Specifically, a convolution kernel is rotated to different orientations for feature extractions of multiple channels. Meanwhile, much fewer kernels than previous works are used to ensure that the output channel does not increase. To further enhance the orthogonality of kernels in different orientations, we construct the non-maximum-suppression loss on the rotation dimension to suppress the other directions except the most activated one. Considering that the low-level-features benefit more from the rotational symmetry, we only share weights in the shallow layers (SWSL) via RGEC. Extensive experiments on multiple datasets (i.e., ImageNet, CIFAR, and MNIST) demonstrate that SWSL can effectively benefit from the higher-degree weight sharing and improve the performances of various networks, including plain and ResNet architectures. Meanwhile, the convolutional kernels and parameters are much fewer (e.g., 75%, 87.5% fewer) in the shallow layers, and no extra computation costs are introduced.

This is a preview of subscription content, access via your institution.

References

  1. Y. Lecun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. DOI: https://doi.org/10.1109/5.726791.

    Article  Google Scholar 

  2. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, vol. 1, no. 4, pp. 541–551, 1989. DOI: https://doi.org/10.1162/neco.1989.1.4.541.

    Article  Google Scholar 

  3. R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, USA, pp. 580–587, 2014. DOI: https://doi.org/10.1109/CVPR.2014.81.

    Google Scholar 

  4. J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3431–3440, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298965.

    Google Scholar 

  5. I. D. Longstaff, J. F. Cross. A pattern recognition approach to understanding the multi-layer perception. Pattern Recognition Letters, vol. 5, no. 5, pp. 315–319, 1987. DOI: https://doi.org/10.1016/0167-8655(87)90072-9.

    Article  Google Scholar 

  6. A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, H. Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. [Online], Available: https://arxiv.org/abs/1704.04861, 2017.

  7. T. Zhang, G. J. Qi, B. Xiao, J. D. Wang. Interleaved group convolutions. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 4383–4392, 2017. DOI: https://doi.org/10.1109/ICCV.2017.469.

    Google Scholar 

  8. X. Y. Zhang, X. Y. Zhou, M. X. Lin, J. Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 6848–6856, 2018. DOI: https://doi.org/10.1109/CV-PR.2018.00716.

    Google Scholar 

  9. T. Cohen, M. Welling. Group equivariant convolutional networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, USA, pp.2990–2999, 2016.

  10. M. Weiler, F. A. Hamprecht, M. Storath. Learning steerable filters for rotation equivariant CNNs. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 849–858, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00095.

  11. M. Weiler, G. Cesa. General E(2)-equivariant steerable CNNs. In Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 14334–14345 2019.

  12. Z. Y. Shen, L. S. He, Z. C. Lin, J. W. Ma. PDO-eConvs: Partial differential operator based equivariant convolutions. In Proceedings of the 37th International Conference on Machine Learning, pp. 8697–9706, 2020.

  13. A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, pp. 1097–1105, 2012.

  14. K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. [Online], Available: https://arxiv.org/abs/1409.1556, 2014.

  15. C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.

    Book  Google Scholar 

  16. K. M. He, X Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.

    Google Scholar 

  17. G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2261–2269, 2017. DOI: https://doi.org/10.1109/CVPR.2017.243.

    Google Scholar 

  18. M. Lin, Q. Chen, S. C. Yan. Network in network. [Online], Available: https://arxiv.org/abs/1312.4400, 2014.

  19. K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2014. DOI: https://doi.org/10.1109/TPAMI.2015.2389824.

    Article  Google Scholar 

  20. R. Girshick. Fast R-CNN. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1440–1448, 2015. DOI: https://doi.org/10.1109/ICCV.2015.169.

    Google Scholar 

  21. S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 91–99, 2015.

  22. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg. SSD: Single shot MultiBox detector. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 21–37, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_2.

    Google Scholar 

  23. J. Redmon, S. Divvala, R. Girshick, A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 779–788, 2016. DOI: https://doi.org/10.1109/CVPR.2016.91.

    Google Scholar 

  24. K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no 2, pp. 386–397, 2020. DOI: https://doi.org/10.1109/TPAMI.2018.2844175.

    Article  Google Scholar 

  25. K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, B. Glocker. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation Medical Image Analysis, vol. 36, pp. 61–78, 2017 DOI: https://doi.org/10.1016/j.media.2016.10.004.

    Article  Google Scholar 

  26. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. [Online], Available: https://arxiv.org/abs/1412.7062, 2014.

  27. O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Munich, Germany, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28.

    Google Scholar 

  28. M. Frrer, M. Gary, S. Hernández. Representation of group isomorphisms: The compact case. Journal of Function Spaces, vol. 2015, Article number 879414, 2015. DOI: https://doi.org/10.1155/2015/879414.

  29. Y. C. Xu, T. J. Xiao, J. X. Zhang, K. Y. Yang, Z. Zhang. Scale-invariant convolutional neural networks. [Online], Available: https://arxiv.org/abs/1411.6369, 2014.

  30. S. Dieleman, J. De Fauw, K. Kavukcuoglu. Exploiting cyclic symmetry in convolutional neural networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, USA, pp. 1889–1898, 2016.

  31. X. Y. Cheng, Q. Qiu, A. R. Calderbank, G. Sapiro. Rot-DCF: Decomposition of convolutional filters for rotation-equivariant deep networks. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019.

  32. Y. Xi, J. B. Zheng, X. X. Li, X. Y. Xu, J. C. Ren, G. Xie. SR-POD: Sample rotation based on principal-axis orientation distribution for data augmentation in deep object detection. Cognitive Systems Research, vol. 52, pp. 144–154, 2018. DOI: https://doi.org/10.1016/j.cogsys.2018.06.014.

  33. C. J. Luo, Y. Z. Zhu, L. W. Jin, Y. P. Wang. Learn to augment: Joint data augmentation and network optimization for text recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 13743–13752, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01376.

    Google Scholar 

  34. S. Graham, D. Epstein, N. Rajpoot. Dense steerable filter CNNs for exploiting rotational symmetry in histology images. IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 4124–4136, 2020. DOI: https://doi.org/10.1109/TMI.2020.3013246.

    Article  Google Scholar 

  35. M. Jacquemont, L. Antiga, T. Vuillaume, G. Silvestri, A. Benoit, P. Lambert, G. Maurin. Indexed operations for non-rectangular lattices applied to convolutional neural networks. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Prague, Czech Republic, pp. 362–371, 2019.

  36. E. Hoogeboom, J. W. T. Peters, T. S. Cohen, M. Welling. HexaConv. [Online], Available: https://arxiv.org/abs/1803.02108, 2018.

  37. C. E. Rasmussen, Z. Ghahramani. Occam’s razor. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, USA, pp. 276–282, 2000.

  38. S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, pp. 448–456, 2015.

  39. J. Hu, L. Shen, S. Albanie, G. Sun, E. H. Wu. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 2011–2023, 2020. DOI: https://doi.org/10.1109/TPAMI.2019.2913372.

    Article  Google Scholar 

  40. I. Sutskever, J. Martens, G. Dahl, G. Hinton. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, pp. III-1139–III-1147, 2013.

  41. H. Larochelle, D. Erhan, A. Courville, J. Bergstra, Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ACM, Corvalis, USA, pp. 473–480, 2007. DOI: https://doi.org/10.1145/1273496.1273556.

    Google Scholar 

  42. D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.

  43. J. Bruna, S. Mallat. Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1872–1886, 2013. DOI: https://doi.org/10.1109/TPAMI.2012.230.

    Article  Google Scholar 

  44. T. H. Chan, K. Jia, S. H. Gao, J. W. Lu, Z. N. Zeng, Y. Ma. PCANet: A simple deep learning baseline for image classification? IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5017–5032, 2015. DOI: https://doi.org/10.1109/TIP.2015.2475625.

    MathSciNet  Article  Google Scholar 

  45. K. Sohn, H. Lee. Learning invariant representations with local transformations. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, UK, pp.1339-1346, 2012.

  46. Y. Z. Zhou, Q. X. Ye, Q. Qiu, J. B. Jiao. Oriented response networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 4961–4970, 2017. DOI: https://doi.org/10.1109/CVPR.2017.527.

    Google Scholar 

  47. D. Laptev, N. Savinov, J. M. Buhmann, M. Pollefeys. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 289–297, 2016. DOI: https://doi.org/10.1109/CVPR.2016.38.

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 61976209 and 62020106015), CAS International Collaboration Key Project (No. 173211KYSB20190024), and Strategic Priority Research Program of CAS (No. XDB32040000).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huiguang He.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Zhiqiang Chen received the B. Sc. degree in intelligence science and technology from College of Automation, University of Science and Technology Beijing, China in 2014. He received the M. Sc. and Ph. D. degrees in pattern recognition and intelligence systems with Institute of Automation, Chinese Academy of Sciences, China in 2017 and 2021, respectively. He is currently a postdoctoral fellow with Beijing Academy of Artificial Intelligence and the Institute of Automation, Chinese Academy of Sciences, China.

His research interests include machine learning, brain-inspired intelligence, medical image, and computer vision.

Ting-Bing Xu received the B. Sc. degree in automation from China University of Petroleum, China in 2014, and the Ph. D. degree in computer applied technology from National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, China in 2020. He was a visiting researcher with Department of Computer Science and Intelligent Systems, Osaka Prefecture University, Japan in 2018. He is currently a postdoctoral fellow with School of Instrumentation and Optoelectronic Engineering, Beihang University, China.

His research interests include deep learning, machine learning, pattern recognition, handwriting recognition, and computer vision.

Jinpeng Li received the B. Eng. degree and M. Eng. degree in automatic control from University of Science and Technology, China in 2012 and 2015. He received the Ph. D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, China in 2019. He is now a researcher at Ningbo HwaMei Hospital, University of Chinese Academy of Sciences, China, and Institute of Life and Health Science, China, University of Chinese Academy of Sciences, China. He has authored or coauthored more than ten peer-reviewed papers in international journals and conferences.

His research interests include pattern recognition, machine learning, deep learning, transfer learning algorithms and their applications in epidemiology and medical image analysis.

Huiguang He received the B. Sc. and M. Sc. degrees from Dalian Maritime University (DMU), China in 1994 and 1997, respectively, and the Ph. D. degree (Hons.) in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, China. From 1997 to 1999, he was an associate lecturer with DMU. From 2003 to 2004, he was a post-doctoral researcher with University of Rochester, USA. From 2014 to 2015, he was a visiting professor with University of North Carolina at Chapel Hill, USA. He is currently a full professor with Institute of Automation, Chinese Academy of Sciences. His research has been supported by several research grants from National Science Foundation of China. He has authored or co-authored more than 180 peer-reviewed papers. Dr. He is an Excellent Member of Youth Innovation Promotion Association, Chinese Academy of Sciences in 2016. He was a recipient of the Excellent Ph. D. dissertation of Chinese Academy of Sciences in 2004, the National Science and Technology Award in 2003 and 2004, the Beijing Science and Technology Award in 2002 and 2003, the K.C. Wong Education Prizes in 2007 and 2009, and the Jia-Xi Lu Young Talent Prize in 2009.

His research interests include pattern recognition, medical image processing, and brain-computer interfaces (BCI).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Xu, TB., Li, J. et al. Sharing Weights in Shallow Layers via Rotation Group Equivariant Convolutions. Mach. Intell. Res. 19, 115–126 (2022). https://doi.org/10.1007/s11633-022-1324-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-022-1324-5

Keywords

  • Convolutional neural networks (CNNs)
  • group equivariance
  • higher-degree weight sharing
  • parameter efficiency